Mmlv reverse transcriptase variants

ABSTRACT

Disclosed herein, are compositions, methods, and kits comprising engineered reverse transcription enzymes that exhibit several desired properties such as thermal stability, processive reverse transcription, non-templated base addition, and template switching ability. The engineered reverse transcription enzymes described herein demonstrate unexpectedly higher resistance to cell lysate inhibition, greater ability to capture full-length mRNA transcripts, and demonstrate improved results in small reaction volumes as compared to other engineered reverse transcription enzymes.

CROSS-REFERENCE

This application is a continuation of PCT Application Serial No.PCT/US2018/029641, filed Apr. 26, 2018, which claims the benefit of U.S.Provisional Application No. 62/490,492 filed Apr. 26, 2017, which areincorporated by reference herein in their entirety.

BACKGROUND

Significant advances in analyzing and characterizing biological andbiochemical materials and systems have led to unprecedented advances inunderstanding the mechanisms of life, health, disease and treatment.Among these advances, technologies that target and characterize thegenomic make up of biological systems have yielded some of the mostgroundbreaking results, including advances in the use and exploitationof genetic amplification technologies, and nucleic acid sequencingtechnologies.

Nucleic acid sequencing can be used to obtain information in a widevariety of biomedical contexts, including diagnostics, prognostics,biotechnology, and forensic biology. Sequencing may involve basicmethods including Maxam-Gilbert sequencing and chain-terminationmethods, or de novo sequencing methods including shotgun sequencing andbridge PCR, or next-generation methods including polony sequencing, 454pyrosequencing, Illumina sequencing, SOLiD sequencing, Ion Torrentsemiconductor sequencing, HeliScope single molecule sequencing, SMRT®sequencing, and others.

Despite these advances in biological characterization, many challengesstill remain unaddressed, or relatively poorly addressed by thesolutions currently being offered. The present disclosure provides novelsolutions and approaches to addressing many of the shortcomings ofexisting technologies.

SUMMARY

Disclosed herein, in some embodiments, are engineered reversetranscription enzymes, comprising an amino acid sequence that is atleast 80% identical to SEQ ID NO: 3, wherein said amino acid sequencecomprises: (i) a truncation of at least 15 amino acids from theN-terminus relative to SEQ ID NO: 3; and (ii) one or more mutationsselected from the group consisting of an E69 mutation, an L139 mutation,a D200 mutation, an E302 mutation, a T306 mutation, a W313 mutation, aT330 mutation, an L435 mutation, a P448 mutation, a D449 mutation, anN454 mutation, a D524 mutation, an L603 mutation, and an E607 mutationrelative to SEQ ID NO: 3. In some instances, said one or more mutationsare an E69K mutation, an L139P mutation, a D200N mutation, an E302Rmutation, a T306K mutation, a W313F mutation, a T330P mutation, an L435Gor L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to SEQ ID NO: 3. In some instances, said amino acidsequence comprises a plurality of mutations selected from the groupconsisting of an E69 mutation, an L139 mutation, a D200 mutation, anE302 mutation, a T306 mutation, a W313 mutation, a T330 mutation, anL435 mutation, a P448 mutation, a D449 mutation, an N454 mutation, aD524 mutation, an L603 mutation, and an E607 mutation relative to SEQ IDNO: 3. In some instances, said engineered reverse transcription enzymecomprises: (i) three or more mutations selected from the groupconsisting of an L139 mutation, a D200 mutation, a T330 mutation, a P448mutation, a D449 mutation, a D524 mutation, and a L603 mutation relativeto SEQ ID NO: 3; and (ii) three or more mutations selected from thegroup consisting of an E69 mutation, an E302 mutation, a T306 mutation,a W313 mutation, an L435 mutation, and an N454 mutation relative to SEQID NO: 3. In some instances, said engineered reverse transcriptionenzyme comprises: (i) three or more mutations selected from the groupconsisting of an L139P mutation, a D200N mutation, a T330P mutation, aP448A mutation, a D449G mutation, a D524N or D524A mutation, and a L603Wmutation relative to SEQ ID NO: 3; and (ii) three or more mutationsselected from the group consisting of an E69K mutation, an E302Rmutation, a T306K mutation, a W313F mutation, an L435G or L435Kmutation, and an N454K mutation relative to SEQ ID NO: 3. In someinstances, said engineered reverse transcription enzyme comprises: anE69 mutation, an L139 mutation, a D200 mutation, an E302 mutation, aT306 mutation, a W313 mutation, a T330 mutation, an L435 mutation, aP448 mutation, a D449 mutation, an N454 mutation, a D524 mutation, aD524 mutation, an L603 mutation, and an E607 mutation relative to SEQ IDNO: 3. In some instances, said engineered reverse transcription enzymecomprises: an E69K mutation, an L139P mutation, a D200N mutation, anE302R mutation, a T306K mutation, a W313F mutation, a T330P mutation, anL435G or L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to SEQ ID NO: 3. In some instances, said truncationcomprises a truncation of at least 20 amino acids from said N-terminusrelative to SEQ ID NO: 3. In some instances, said truncation comprises atruncation of 23 amino acids from said N-terminus relative to SEQ ID NO:3. In some instances, said engineered reverse transcription enzymefurther comprises an affinity tag at said N-terminus or at a C-terminusof said amino acid sequence. In some instances, said affinity tag is atleast 5 histidine amino acids. In some instances, said engineeredreverse transcription enzyme further comprises a protease cleavagesequence, wherein cleavage of said protease cleavage sequence by aprotease results in cleavage of said affinity tag from said engineeredreverse transcription enzyme. In some instances, said protease cleavagesequence is a thrombin cleavage sequence. In some instances, said aminoacid sequence comprises a MRSSHHHHHHSSGLVPRGS (SEQ ID NO: 7) amino acidsequence at said N-terminus. In some instances, said engineered reversetranscription enzyme comprises an amino acid sequence according to SEQID NO: 6. In some instances, said engineered reverse transcriptionenzyme comprises an amino acid sequence according to SEQ ID NO: 5. Insome instances, said engineered reverse transcription enzyme hasimproved ability to capture full-length transcripts as compared to areverse transcriptase enzyme consisting of SEQ ID NO: 3. In someinstances, said engineered reverse transcription enzyme has higherresistance to cell lysate as compared to a reverse transcriptase enzymeconsisting of SEQ ID NO: 3. In some instances, said engineered reversetranscription enzyme has higher activity in a reaction volume of lessthan 1 nanoliter as compared to a reverse transcriptase enzymeconsisting of SEQ ID NO: 3. In some instances, said engineered reversetranscription enzyme has increased thermal stability and reversetranscription processivity as compared to a reverse transcriptase enzymeconsisting of SEQ ID NO: 3. In some instances, said engineered reversetranscription enzyme comprises terminal transferase activity andtemplate switching ability.

Disclosed herein, in some embodiments, are methods for nucleic acidsample processing, comprising: (a) providing a template ribonucleic acid(RNA) molecule in a reaction volume, and (b) using an engineered reversetranscription enzyme to reverse transcribe said RNA molecule to acomplementary DNA molecule, wherein said engineered reversetranscription enzyme comprises an amino acid sequence that is at least80% identical to SEQ ID NO: 3, wherein said amino acid sequencecomprises: (i) a truncation of at least 15 amino acids from theN-terminus relative to SEQ ID NO: 3; and (ii) one or more mutationsselected from the group consisting of an E69 mutation, an L139 mutation,a D200 mutation, an E302 mutation, a T306 mutation, a W313 mutation, aT330 mutation, an L435 mutation, a P448 mutation, a D449 mutation, anN454 mutation, a D524 mutation, an L603 mutation, and an E607 mutationrelative to SEQ ID NO: 3. In some instances, said reaction volume isless than 1 nanoliter. In some instances, said reaction volume is lessthan 500 picoliters. In some instances, said reaction volume is adroplet in an emulsion. In some instances, said reaction volume is awell. In some instances, said reaction volume further comprises aplurality of nucleic acid barcode molecules comprising a barcodesequence. In some instances, said RNA molecule is a messenger RNA (mRNA)molecule, wherein said plurality of nucleic acid barcode moleculesfurther comprise an oligo(dT) sequence, and wherein said engineeredreverse transcription enzyme reverse transcribes said mRNA molecule intosaid complementary DNA molecule using said oligo(dT) sequence, whereinsaid complementary DNA molecule comprises said barcode sequence. In someinstances, said RNA molecule is a messenger RNA (mRNA) molecule, whereinsaid reaction volume further comprises a nucleic acid moleculecomprising an oligo(dT) sequence, wherein said plurality of nucleic acidbarcode molecules further comprise a template switching sequence,wherein said engineered reverse transcription enzyme reverse transcribessaid mRNA molecule using said nucleic acid molecule comprising saidoligo(dT) sequence, and wherein said engineered reverse transcriptionenzyme performs a template switching reaction, thereby generating saidcomplementary DNA molecule, wherein said complementary DNA moleculecomprises said barcode sequence. In some instances, said plurality ofnucleic acid barcode molecules are attached to a support. In someinstances, said nucleic acid barcode molecules are releasably attachedto said support. In some instances, said support is a bead. In someinstances, said bead is a gel bead. In some instances, said nucleic acidbarcode molecules are covalently attached to said bead. In someinstances, said nucleic acid barcode molecules are releasably attachedto said bead. In some instances, said nucleic acid barcode molecules arereleased upon application of a stimulus. In some instances, saidstimulus is a chemical stimulus. In some instances, said chemicalstimulus is a reducing agent. In some instances, said gel bead is adegradable gel bead. In some instances, said degradable gel beadcomprises chemically cleavable cross-linking. In some instances, saidchemically cleavable cross-linking comprises disulfide cross-linking. Insome instances, said reaction volume comprises a cell comprising saidRNA molecule. In some instances, the method further comprises releasingsaid RNA molecule from said cell.

Disclosed herein, in some embodiments, are kits for performing a reversetranscription reaction, comprising: (a) an engineered reversetranscription enzyme comprising (i) a truncation of at least 15 aminoacids from the N-terminus relative to SEQ ID NO: 3; and (ii) one or moremutations selected from the group consisting of an E69 mutation, an L139mutation, a D200 mutation, an E302 mutation, a T306 mutation, a W313mutation, a T330 mutation, an L435 mutation, a P448 mutation, a D449mutation, an N454 mutation, a D524 mutation, an L603 mutation, and anE607 mutation relative to SEQ ID NO: 3; and (b) instructions for usingsaid engineered reverse transcription enzyme to perform a reversetranscription reaction. In some instances, said kit further comprises areaction buffer for performing said reverse transcription reaction. Insome instances, said kit further comprises dNTPs. In some instances,said engineered reverse transcription enzyme, said buffer, and saiddNTPs are provided together in a master mix solution. In some instances,said master mix is present at a concentration at least two times theworking concentration indicated in said instructions for use in saidreverse transcription reaction. In some instances, said kit furthercomprises a primer for priming said reverse transcription reaction. Insome instances, said primer is a poly-dT primer, a random N-mer primer,or a target-specific primer.

INCORPORATION BY REFERENCE

All publications, patents, and patent applications mentioned in thisspecification are herein incorporated by reference to the same extent asif each individual publication, patent, or patent application wasspecifically and individually indicated to be incorporated by reference.To the extent publications and patents or patent applicationsincorporated by reference contradict the disclosure contained in thespecification, the specification is intended to supersede and/or takeprecedence over any such contradictory material.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth with particularity inthe appended claims. A better understanding of the features andadvantages of the present invention will be obtained by reference to thefollowing detailed description that sets forth illustrative embodiments,in which the principles of the invention are utilized, and theaccompanying drawings (also “Figure” and “FIG.” herein), of which:

FIG. 1 shows an example of a microfluidic channel structure forpartitioning individual or small groups of cells.

FIG. 2 shows an example of a microfluidic channel structure forco-partitioning cells and beads or microcapsules comprising additionalreagents.

FIG. 3 schematically illustrates an example process for amplificationand barcoding of cell's nucleic acids.

FIG. 4 provides a schematic illustration of use of barcoding of cell'snucleic acids in attributing sequence data to individual cells or groupsof cells for use in their characterization.

FIG. 5 provides a schematic illustrating cells associated with labeledcell-binding ligands.

FIG. 6 provides a schematic illustration of an example workflow forperforming RNA analysis using the methods described herein.

FIG. 7 provides a schematic illustration of an example barcodedoligonucleotide structure for use in analysis of ribonucleic (RNA) usingthe methods described herein.

FIG. 8 provides an image of individual cells co-partitioned along withindividual barcode bearing beads

FIG. 9A-FIG. 9E provides schematic illustration of example barcodedoligonucleotide structures for use in analysis of RNA and exampleoperations for performing RNA analysis.

FIG. 10 provides schematic illustration of example barcodedoligonucleotide structure for use in example analysis of RNA and use ofa sequence for in vitro transcription.

FIG. 11 provides schematic illustration of an example barcodedoligonucleotide structure for use in analysis of RNA and exampleoperations for performing RNA analysis.

FIG. 12A-FIG. 12B provides schematic illustration of example barcodedoligonucleotide structure for use in analysis of RNA.

FIG. 13A-FIG. 13C provides illustrations of example yields from templateswitch reverse transcription and PCR in partitions.

FIG. 14A-FIG. 14B provides illustrations of example yields from reversetranscription and cDNA amplification in partitions with various cellnumbers.

FIG. 15 provides an illustration of example yields from cDNA synthesisand real-time quantitative PCR at various input cell concentrations andalso the effect of varying primer concentration on yield at a fixed cellinput concentration.

FIG. 16 provides an illustration of example yields from in vitrotranscription.

FIG. 17 shows an example computer control system that is programmed orotherwise configured to implement methods provided herein.

FIG. 18 provides an illustration of example sequencing saturationresults in picoliter-sized droplets containing an engineered RT enzymecompared to a commercially-available counterpart.

FIG. 19 provides an illustration of example median genes per cell (humangenome hg19) obtained from cDNA libraries prepared in picoliter-sizeddroplets containing an engineered RT enzyme compared to a commerciallyavailable counterpart.

FIG. 20 provides an illustration of example median genes per cell (mousegenome mm10) obtained from cDNA libraries prepared in picoliter-sizeddroplets containing an engineered RT enzyme compared to a commerciallyavailable counterpart.

FIG. 21 shows an exemplary productive pair comparison from a TCRtranscriptional profiling prepared from droplets containing anengineered RT enzyme compared to a commercially available counterpart.

FIG. 22 shows various exemplary results from a TCR transcriptionalprofiling prepared from droplets containing an engineered RT enzymecompared to a commercially available counterpart.

DETAILED DESCRIPTION

While various embodiments of the invention have been shown and describedherein, it will be obvious to those skilled in the art that suchembodiments are provided by way of example only. Numerous variations,changes, and substitutions may occur to those skilled in the art withoutdeparting from the invention. It should be understood that variousalternatives to the embodiments of the invention described herein may beemployed.

Where values are described as ranges, it will be understood that suchdisclosure includes the disclosure of all possible sub-ranges withinsuch ranges, as well as specific numerical values that fall within suchranges irrespective of whether a specific numerical value or specificsub-range is expressly stated.

The term “barcode,” as used herein, generally refers to a label, oridentifier, that conveys or is capable of conveying information about ananalyte. A barcode can be part of an analyte. A barcode can beindependent of an analyte. A barcode can be a tag attached to an analyte(e.g., nucleic acid molecule) or a combination of the tag in addition toan endogenous characteristic of the analyte (e.g., size of the analyteor end sequence(s)). A barcode may be unique. Barcodes can have avariety of different formats. For example, barcodes can include:polynucleotide barcodes; random nucleic acid and/or amino acidsequences; and synthetic nucleic acid and/or amino acid sequences. Abarcode can be attached to an analyte in a reversible or irreversiblemanner. A barcode can be added to, for example, a fragment of adeoxyribonucleic acid (DNA) or ribonucleic acid (RNA) sample before,during, and/or after sequencing of the sample. Barcodes can allow foridentification and/or quantification of individual sequencing-reads.

The terms “adaptor(s)”, “adapter(s)” and “tag(s)” may be usedsynonymously. An adaptor or tag can be coupled to a polynucleotidesequence to be “tagged” by any approach, including ligation,hybridization, or other approaches.

The term “sequencing,” as used herein, generally refers to methods andtechnologies for determining the sequence of nucleotide bases in one ormore polynucleotides. The polynucleotides can be, for example, nucleicacid molecules such as deoxyribonucleic acid (DNA) or ribonucleic acid(RNA), including variants or derivatives thereof (e.g., single strandedDNA). Sequencing can be performed by various systems currentlyavailable, such as, without limitation, a sequencing system byIllumina®, Pacific Biosciences (PacBio®), Oxford Nanopore®, or LifeTechnologies (Ion Torrent®). Alternatively or in addition, sequencingmay be performed using nucleic acid amplification, polymerase chainreaction (PCR) (e.g., digital PCR, quantitative PCR, or real time PCR),or isothermal amplification. Such systems may provide a plurality of rawgenetic data corresponding to the genetic information of a subject(e.g., human), as generated by the systems from a sample provided by thesubject. In some examples, such systems provide sequencing reads (also“reads” herein). A read may include a string of nucleic acid basescorresponding to a sequence of a nucleic acid molecule that has beensequenced. In some situations, systems and methods provided herein maybe used with proteomic information.

The term “bead,” as used herein, generally refers to a particle. Thebead may be a solid or semi-solid particle. The bead may be a gel bead.The gel bead may include a polymer matrix (e.g., matrix formed bypolymerization or cross-linking). The polymer matrix may include one ormore polymers (e.g., polymers having different functional groups orrepeat units). Polymers in the polymer matrix may be randomly arranged,such as in random copolymers, and/or have ordered structures, such as inblock copolymers. Cross-linking can be via covalent, ionic, orinductive, interactions, or physical entanglement. The bead may be amacromolecule. The bead may be formed of nucleic acid molecules boundtogether. The bead may be formed via covalent or non-covalent assemblyof molecules (e.g., macromolecules), such as monomers or polymers. Suchpolymers or monomers may be natural or synthetic. Such polymers ormonomers may be or include, for example, nucleic acid molecules (e.g.,DNA or RNA). The bead may be formed of a polymeric material. The beadmay be magnetic or non-magnetic. The bead may be rigid. The bead may beflexible and/or compressible. The bead may be disruptable ordissolvable. The bead may be a solid particle (e.g., a metal-basedparticle including but not limited to iron oxide, gold or silver)covered with a coating comprising one or more polymers. Such coating maybe disruptable or dissolvable.

The term “sample,” as used herein, generally refers to a biologicalsample of a subject. The biological sample may comprise any number ofmacromolecules, for example, cellular macromolecules. The sample may bea cell sample. The sample may be a cell line or cell culture sample. Thesample can include one or more cells. The sample can include one or moremicrobes. The biological sample may be a nucleic acid sample or proteinsample. The biological sample may also be a carbohydrate sample or alipid sample. The biological sample may be derived from another sample.The sample may be a tissue sample, such as a biopsy, core biopsy, needleaspirate, or fine needle aspirate. The sample may be a fluid sample,such as a blood sample, urine sample, or saliva sample. The sample maybe a skin sample. The sample may be a cheek swab. The sample may be aplasma or serum sample. The sample may be a cell-free or cell freesample. A cell-free sample may include extracellular polynucleotides.Extracellular polynucleotides may be isolated from a bodily sample thatmay be selected from the group consisting of blood, plasma, serum,urine, saliva, mucosal excretions, sputum, stool and tears.

The term “subject,” as used herein, generally refers to an animal, suchas a mammal (e.g., human) or avian (e.g., bird), or other organism, suchas a plant. For example, the subject can be a vertebrate, a mammal, arodent (e.g., a mouse), a primate, a simian or a human. Animals mayinclude, but are not limited to, farm animals, sport animals, and pets.A subject can be a healthy or asymptomatic individual, an individualthat has or is suspected of having a disease (e.g., cancer) or apre-disposition to the disease, and/or an individual that is in need oftherapy or suspected of needing therapy. A subject can be a patient. Asubject can be a microorganism or microbe (e.g., bacteria, fungi,archaea, viruses).

The term “molecular tag,” as used herein, generally refers to a moleculecapable of binding to a macromolecular constituent. The molecular tagmay bind to the macromolecular constituent with high affinity. Themolecular tag may bind to the macromolecular constituent with highspecificity. The molecular tag may comprise a nucleotide sequence. Themolecular tag may comprise a nucleic acid sequence. The nucleic acidsequence may be at least a portion or an entirety of the molecular tag.The molecular tag may be a nucleic acid molecule or may be part of anucleic acid molecule. The molecular tag may be an oligonucleotide or apolypeptide. The molecular tag may comprise a DNA aptamer. The moleculartag may be or comprise a primer. The molecular tag may be, or comprise,a protein. The molecular tag may comprise a polypeptide. The moleculartag may be a barcode.

The term “partition,” as used herein, generally, refers to a space orvolume that may be suitable to contain one or more species or conductone or more reactions. A partition may be a physical compartment, suchas a droplet or well. The partition may isolate space or volume fromanother space or volume. The droplet may be a first phase (e.g., aqueousphase) in a second phase (e.g., oil) immiscible with the first phase.The droplet may be a first phase in a second phase that does not phaseseparate from the first phase, such as, for example, a capsule orliposome in an aqueous phase. A partition may comprise one or more other(inner) partitions. In some cases, a partition may be a virtualcompartment that can be defined and identified by an index (e.g.,indexed libraries) across multiple and/or remote physical compartments.For example, a physical compartment may comprise a plurality of virtualcompartments.

I. Single Cell Analysis

Advanced nucleic acid sequencing technologies have yielded monumentalresults in sequencing biological materials, including providingsubstantial sequence information on individual organisms, and relativelypure biological samples. However, these systems have not proveneffective at being able to identify and characterize sub-populations ofcells in biological samples that may represent a smaller minority of theoverall make-up of the sample, but for which individualized sequenceinformation could prove even more valuable.

Most nucleic acid sequencing technologies derive the nucleic acids thatthey sequence from collections of cells derived from tissue or othersamples. The cells can be processed, en masse, to extract the geneticmaterial that represents an average of the population of cells, whichcan then be processed into sequencing ready DNA libraries that areconfigured for a given sequencing technology. As will be appreciated,although often discussed in terms of DNA or nucleic acids, the nucleicacids derived from the cells may include DNA, or RNA, including, e.g.,mRNA, total RNA, or the like, that may be processed to produce cDNA forsequencing, e.g., using any of a variety of RNA-seq methods. Followingfrom this processing, absent a cell specific marker, attribution ofgenetic material as being contributed by a subset of cells or all cellsin a sample is virtually impossible in such an ensemble approach.

In addition to the inability to attribute characteristics to particularsubsets of populations of cells, such ensemble sample preparationmethods also are, from the outset, predisposed to primarily identifyingand characterizing the majority constituents in the sample of cells, andare not designed to be able to pick out the minority constituents, e.g.,genetic material contributed by one cell, a few cells, or a smallpercentage of total cells in the sample. Likewise, where analyzingexpression levels, e.g., of mRNA, an ensemble approach would bepredisposed to presenting potentially grossly inaccurate data from cellpopulations that are non-homogeneous in terms of expression levels. Insome cases, where expression is high in a small minority of the cells inan analyzed population, and absent in the majority of the cells of thepopulation, an ensemble method would indicate low level expression forthe entire population.

This original majority bias is further magnified, and even overwhelming,through processing operations used in building up the sequencinglibraries from these samples. In particular, most next generationsequencing technologies rely upon the geometric amplification of nucleicacid fragments, such as the polymerase chain reaction, in order toproduce sufficient DNA for the sequencing library. However, suchgeometric amplification is biased toward amplification of majorityconstituents in a sample, and may not preserve the starting ratios ofsuch minority and majority components. By way of example, if a sampleincludes 95% DNA from a particular cell type in a sample, e.g., hosttissue cells, and 5% DNA from another cell type, e.g., cancer cells, PCRbased amplification can preferentially amplify the majority DNA in placeof the minority DNA, both as a function of comparative exponentialamplification (the repeated doubling of the higher concentration quicklyoutpaces that of the smaller fraction) and as a function ofsequestration of amplification reagents and resources (as the largerfraction is amplified, it preferentially utilizes primers and otheramplification reagents).

While some of these difficulties may be addressed by utilizing differentsequencing systems, such as single molecule systems that don't requireamplification, the single molecule systems, as well as the ensemblesequencing methods of other next generation sequencing systems, can alsohave requirements for sufficiently large input requirements.

II. Compartmentalization and Characterization of Cells

Disclosed herein, however, are methods and systems for characterizingnucleic acids from small populations of cells, and in some cases, forcharacterizing nucleic acids from individual cells, especially in thecontext of larger populations of cells. The methods and systems provideadvantages of being able to provide the attribution advantages of thenon-amplified single molecule methods with the high throughput of theother next generation systems, with the additional advantages of beingable to process and sequence extremely low amounts of input nucleicacids derivable from individual cells or small collections of cells.

In particular, the methods described herein compartmentalize theanalysis of individual cells or small populations of cells, includinge.g., nucleic acids from individual cells or small groups of cells, andthen allow that analysis to be attributed back to the individual cell orsmall group of cells from which the nucleic acids were derived. This canbe accomplished regardless of whether the cell population represents a50/50 mix of cell types, a 90/10 mix of cell types, or virtually anyratio of cell types, as well as a complete heterogeneous mix ofdifferent cell types, or any mixture between these. Differing cell typesmay include cells or biologic organisms from different tissue types ofan individual, from different individuals, from differing genera,species, strains, variants, or any combination of any or all of theforegoing. For example, differing cell types may include normal andtumor tissue from an individual, multiple different bacterial species,strains and/or variants from environmental, forensic, microbiome orother samples, or any of a variety of other mixtures of cell types.

In one aspect, the methods and systems described herein, provide for thecompartmentalization, depositing or partitioning of the nucleic acidcontents of individual cells from a sample material containing cells,into discrete compartments or partitions (referred to interchangeablyherein as partitions), where each partition maintains separation of itsown contents from the contents of other partitions. The partition can bea droplet in an emulsion. A partition may comprise one or more otherpartitions.

A partition may include one or more cells. A partition may include oneor more types of cells. A partition may comprise one or more gel beads.A partition may comprise one or more cell beads. A partition may includea single gel bead, a single cell bead, or both a single cell bead andsingle gel bead. A partition may include one or more reagents.Alternatively, a partition may be unoccupied. For example, a partitionmay not comprise a bead. A cell bead can be a cell encased inside of agel or polymer matrix, such as via polymerization of a dropletcontaining the cell and precursors capable of being polymerized orgelled. Unique identifiers, such as barcodes, may be injected into thedroplets previous to, subsequent to, or concurrently with dropletgeneration, such as via a microcapsule (e.g., bead), as describedelsewhere herein. Microfluidic channel networks (e.g., on a chip) can beutilized to generate partitions as described herein. Alternativemechanisms may also be employed in the partitioning of the nucleic acidcontents of individual cells, including porous membranes through whichaqueous mixtures of cells are extruded into non-aqueous fluids.

As used herein, in some aspects, the partitions refer to containers orvessels (such as wells, microwells, tubes, through ports in nanoarraysubstrates, e.g., BioTrove nanoarrays, or other containers). In manysome aspects, however, the compartments or partitions comprisepartitions that are flowable within fluid streams. The partitions maycomprise, for example, micro-vesicles that have an outer barriersurrounding an inner fluid center or core. In some cases, the partitionsmay comprise a porous matrix that is capable of entraining and/orretaining materials within its matrix. The partitions can be droplets ofa first phase within a second phase, wherein the first and second phasesare immiscible. For example, the partitions can be droplets of aqueousfluid within a non-aqueous continuous phase (e.g., oil phase). Inanother example, the partitions can be droplets of a non-aqueous fluidwithin an aqueous phase. In some examples, the partitions may beprovided in a water-in-oil emulsion or oil-in-water emulsion. A varietyof different vessels are described in, for example, U.S. PatentApplication Publication No. 2014/0155295, which is entirely incorporatedherein by reference for all purposes. Emulsion systems for creatingstable droplets in non-aqueous or oil continuous phases are describedin, for example, U.S. Patent Application Publication No. 2010/0105112,which is entirely incorporated herein by reference for all purposes.

In the case of droplets in an emulsion, allocating individual cells todiscrete partitions may in one non-limiting example be accomplished byintroducing a flowing stream of particles in an aqueous fluid into aflowing stream of a non-aqueous fluid, such that droplets are generatedat the junction of the two streams. Fluid properties (e.g., fluid flowrates, fluid viscosities, etc.), particle properties (e.g., volumefraction, particle size, particle concentration, etc.), microfluidicarchitectures (e.g., channel geometry, etc.), and other parameters maybe adjusted to control the occupancy of the resulting partitions (e.g.,number of cells per partition, number of beads per partition, etc.). Forexample, partition occupancy can be controlled by providing the aqueousstream at a certain concentration and/or flow rate of particles. Togenerate single cell partitions, the relative flow rates of theimmiscible fluids can be selected such that, on average, the partitionsmay contain less than one cell per partition in order to ensure thatthose partitions that are occupied are primarily singly occupied. Insome cases, partitions among a plurality of partitions may contain atmost one biological particle (e.g., bead, DNA, cell or cellularmaterial). In some embodiments, the various parameters (e.g., fluidproperties, particle properties, microfluidic architectures, etc.) maybe selected or adjusted such that a majority of partitions are occupied,for example, allowing for only a small percentage of unoccupiedpartitions. The flows and channel architectures can be controlled as toensure a given number of singly occupied partitions, less than a certainlevel of unoccupied partitions and/or less than a certain level ofmultiply occupied partitions.

In certain cases, microfluidic channel networks are particularly suitedfor generating partitions as described herein. Examples of suchmicrofluidic devices include those described in detail in ProvisionalU.S. Patent Application No. 61/977,804, filed Apr. 4, 2014, the fulldisclosure of which is incorporated herein by reference in its entiretyfor all purposes. Alternative mechanisms may also be employed in thepartitioning of individual cells, including porous membranes throughwhich aqueous mixtures of cells are extruded into non-aqueous fluids.Such systems are generally available from, e.g., Nanomi, Inc.

FIG. 1 shows an example of a microfluidic channel structure 100 forpartitioning individual cell. The channel structure 100 can includechannel segments 102, 104, 106 and 108 communicating at a channeljunction 110. In operation, a first aqueous fluid 112 that includessuspended individual cells 114 may be transported along channel segment102 into junction 110, while a second fluid 116 that is immiscible withthe aqueous fluid 112 is delivered to the junction 110 from each ofchannel segments 104 and 106 to create discrete droplets 118, 120 of thefirst aqueous fluid 112 flowing into channel segment 108, and flowingaway from junction 110. The channel segment 108 may be fluidicallycoupled to an outlet reservoir where the discrete droplets can be storedand/or harvested. A discrete droplet generated may include an individualcell 114 (such as droplets 118). A discrete droplet generated mayinclude more than one individual cell 114 (not shown in FIG. 1). Adiscrete droplet may contain no cell 114 (such as droplet 120). Eachdiscrete partition may maintain separation of its own contents (e.g.,individual cell 114) from the contents of other partitions.

The second fluid 116 can comprise an oil, such as a fluorinated oil,that includes a fluorosurfactant for stabilizing the resulting droplets,for example, inhibiting subsequent coalescence of the resulting droplets118, 120. Examples of particularly useful partitioning fluids andfluorosurfactants are described, for example, in U.S. Patent ApplicationPublication No. 2010/0105112, which is entirely incorporated herein byreference for all purposes.

As will be appreciated, the channel segments described herein may becoupled to any of a variety of different fluid sources or receivingcomponents, including reservoirs, tubing, manifolds, or fluidiccomponents of other systems. As will be appreciated, the microfluidicchannel structure 100 may have other geometries. For example, amicrofluidic channel structure can have more than one channel junction.For example, a microfluidic channel structure can have 2, 3, 4, or 5channel segments each carrying particles (e.g., cells, cell beads,and/or gel beads) that meet at a channel junction. Fluid may be directedto flow along one or more channels or reservoirs via one or more fluidflow units. A fluid flow unit can comprise compressors (e.g., providingpositive pressure), pumps (e.g., providing negative pressure),actuators, and the like to control flow of the fluid. Fluid may also orotherwise be controlled via applied pressure differentials, centrifugalforce, electrokinetic pumping, vacuum, capillary or gravity flow, or thelike.

The generated droplets may comprise two subsets of droplets: (1)occupied droplets 118, containing one or more cells 114, and (2)unoccupied droplets 120, not containing any cells 114. Occupied droplets118 may comprise singly occupied droplets (having one cell) and multiplyoccupied droplets (having more than one cell). As described elsewhereherein, in some cases, the majority of occupied partitions can includeno more than one cell per occupied partition and some of the generatedpartitions can be unoccupied (of any cell). In some cases, though, someof the occupied partitions may include more than one cell. In manycases, the systems and methods are used to ensure that the substantialmajority of occupied partitions (partitions containing one or moremicrocapsules) include no more than 1 cell per occupied partition. Insome cases, the partitioning process may be controlled such that fewerthan about 25% of the occupied partitions contain more than one cell,and in many cases, fewer than about 20% of the occupied partitions havemore than one cell, while in some cases, fewer than about 10% or evenfewer than about 5% of the occupied partitions include more than onecell per partition.

In some cases, it may be desirable to minimize the creation of excessivenumbers of empty partitions, such as to reduce costs and/or increaseefficiency. While this minimization may be achieved by providing asufficient number of cells (e.g., 114) at the partitioning junction 110,such as to ensure that at least one cell is encapsulated in a partition,the Poissonian distribution may expectedly increase the number ofpartitions that include multiple cells. As such, where singly occupiedpartitions are to be obtained, at most about 95%, 90%, 85%, 80%, 75%,70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5% orless of the generated partitions can be unoccupied. As such, inaccordance with aspects described herein, the flow of one or more of thecells, or other fluids directed into the partitioning zone arecontrolled such that, in many cases, no more than 50% of the generatedpartitions are unoccupied, i.e., including less than 1 cell, no morethan 25% of the generated partitions, no more than 10% of the generatedpartitions, may be unoccupied. Further, in some aspects, these flows arecontrolled so as to present non-poissonian distribution of singleoccupied partitions while providing lower levels of unoccupiedpartitions. Restated, in some aspects, the above noted ranges ofunoccupied partitions can be achieved while still providing any of thesingle occupancy rates described above. For example, in many cases, theuse of the systems and methods described herein creates resultingpartitions that have multiple occupancy rates of from less than 25%,less than 20%, less than 15%, less than 10%, and in many cases, lessthan 5%, while having unoccupied partitions of from less than 50%, lessthan 40%, less than 30%, less than 20%, less than 10%, and in somecases, less than 5%.

In some cases, the flow of one or more of the biological particles(e.g., in channel segment 102), or other fluids directed into thepartitioning junction (e.g., in channel segments 104, 106) can becontrolled such that, in many cases, no more than about 50% of thegenerated partitions, no more than about 25% of the generatedpartitions, or no more than about 10% of the generated partitions areunoccupied. These flows can be controlled so as to present anon-Poissonian distribution of single-occupied partitions whileproviding lower levels of unoccupied partitions. The above noted rangesof unoccupied partitions can be achieved while still providing any ofthe single occupancy rates described above. For example, in many cases,the use of the systems and methods described herein can create resultingpartitions that have multiple occupancy rates of less than about 25%,less than about 20%, less than about 15%, less than about 10%, and inmany cases, less than about 5%, while having unoccupied partitions ofless than about 50%, less than about 40%, less than about 30%, less thanabout 20%, less than about 10%, less than about 5%, or less.

As will be appreciated, the above-described occupancy rates are alsoapplicable to partitions that include both cells and additionalreagents, including, but not limited to, microcapsules or beads (e.g.,gel beads) carrying barcoded nucleic acid molecules (e.g.,oligonucleotides) (described in relation to FIG. 2). The occupiedpartitions (e.g., at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%,90%, 95%, or 99% of the occupied partitions) can include both amicrocapsule (e.g., bead) comprising barcoded nucleic acid molecules anda cell. In particular, it may be desirable to provide that at least 50%of the partitions are occupied by at least one cell and at least onebead, or at least 75% of the partitions may be so occupied, or even atleast 80% or at least 90% of the partitions may be so occupied. Further,in those cases where it is desired to provide a single cell and a singlebead within a partition, at least 50% of the partitions can be sooccupied, at least 60%, at least 70%, at least 80% or even at least 90%of the partitions can be so occupied.

In another aspect, in addition to or as an alternative to droplet basedpartitioning, cells may be encapsulated within a microcapsule thatcomprises an outer shell, layer or porous matrix in which is entrainedone or more individual cells or small groups of cells. The microcapsulemay include other reagents. Encapsulation of cells may be performed by avariety of processes. Such processes may combine an aqueous fluidcontaining the cells with a polymeric precursor material that may becapable of being formed into a gel or other solid or semi-solid matrixupon application of a particular stimulus to the polymer precursor. Suchstimuli can include, for example, thermal stimuli (e.g., either heatingor cooling), photo-stimuli (e.g., through photo-curing), chemicalstimuli (e.g., through crosslinking, polymerization initiation of theprecursor (e.g., through added initiators)), mechanical stimuli, or acombination thereof.

Preparation of microcapsules comprising cells may be performed by avariety of methods. For example, air knife droplet or aerosol generatorsmay be used to dispense droplets of precursor fluids into gellingsolutions in order to form microcapsules that include individual cellsor small groups of cells. Likewise, membrane based encapsulation systemsmay be used to generate microcapsules comprising encapsulated cells asdescribed herein. Microfluidic systems of the present disclosure, suchas that shown in FIG. 1, may be readily used in encapsulating cells asdescribed herein. In particular, and with reference to FIG. 1, theaqueous fluid 112 comprising (i) the individual cells 114 and (ii) thepolymer precursor material (not shown) is flowed into channel junction110, where it is partitioned into droplets 118, 120 through the flow ofnon-aqueous fluid 116. In the case of encapsulation methods, non-aqueousfluid 116 may also include an initiator (not shown) to causepolymerization and/or crosslinking of the polymer precursor to form themicrocapsule that includes the entrained cells. Examples of polymerprecursor/initiator pairs include those described in U.S. PatentApplication Publication No. 2014/0378345, which is entirely incorporatedherein by reference for all purposes.

For example, in the case where the polymer precursor material comprisesa linear polymer material, such as a linear polyacrylamide, PEG, orother linear polymeric material, the activation agent may comprise across-linking agent, or a chemical that activates a cross-linking agentwithin the formed droplets. Likewise, for polymer precursors thatcomprise polymerizable monomers, the activation agent may comprise apolymerization initiator. For example, in certain cases, where thepolymer precursor comprises a mixture of acrylamide monomer with aN,N′-bis-(acryloyl)cystamine (BAC) comonomer, an agent such astetraethylmethylenediamine (TEMED) may be provided within the secondfluid streams 116 in channel segments 104 and 106, which can initiatethe copolymerization of the acrylamide and BAC into a cross-linkedpolymer network, or hydrogel.

Upon contact of the second fluid stream 116 with the first fluid stream112 at junction 110, during formation of droplets, the TEMED may diffusefrom the second fluid 116 into the aqueous fluid 112 comprising thelinear polyacrylamide, which will activate the crosslinking of thepolyacrylamide within the droplets 118, 120, resulting in the formationof gel (e.g., hydrogel) microcapsules, as solid or semi-solid beads orparticles entraining the cells 114. Although described in terms ofpolyacrylamide encapsulation, other ‘activatable’ encapsulationcompositions may also be employed in the context of the methods andcompositions described herein. For example, formation of alginatedroplets followed by exposure to divalent metal ions (e.g., Ca²⁺ ions),can be used as an encapsulation process using the described processes.Likewise, agarose droplets may also be transformed into capsules throughtemperature based gelling (e.g., upon cooling, etc.).

In some cases, encapsulated cells can be selectively releasable from themicrocapsule, such as through passage of time or upon application of aparticular stimulus, that degrades the microcapsule sufficiently toallow the cell, or its other contents to be released from themicrocapsule, such as into a partition (e.g., droplet). For example, inthe case of the polyacrylamide polymer described above, degradation ofthe microcapsule may be accomplished through the introduction of anappropriate reducing agent, such as DTT or the like, to cleave disulfidebonds that cross-link the polymer matrix. See, for example, U.S. PatentApplication Publication No. 2014/0378345, which is entirely incorporatedherein by reference for all purposes.

The cell can be subjected to other conditions sufficient to polymerizeor gel the precursors. The conditions sufficient to polymerize or gelthe precursors may comprise exposure to heating, cooling,electromagnetic radiation, and/or light. The conditions sufficient topolymerize or gel the precursors may comprise any conditions sufficientto polymerize or gel the precursors. Following polymerization orgelling, a polymer or gel may be formed around the cell. The polymer orgel may be diffusively permeable to chemical or biochemical reagents.The polymer or gel may be diffusively impermeable to macromolecularconstituents of the cell. In this manner, the polymer or gel may act toallow the cell to be subjected to chemical or biochemical operationswhile spatially confining the nucleic acids to a region of the dropletdefined by the polymer or gel. The polymer or gel may include one ormore of disulfide cross-linked polyacrylamide, agarose, alginate,polyvinyl alcohol, polyethylene glycol (PEG)-diacrylate, PEG-acrylate,PEG-thiol, PEG-azide, PEG-alkyne, other acrylates, chitosan, hyaluronicacid, collagen, fibrin, gelatin, or elastin. The polymer or gel maycomprise any other polymer or gel.

The polymer or gel may be functionalized to bind to targeted analytes,such as nucleic acids, proteins, carbohydrates, lipids or otheranalytes. The polymer or gel may be polymerized or gelled via a passivemechanism. The polymer or gel may be stable in alkaline conditions or atelevated temperature. The polymer or gel may have mechanical propertiessimilar to the mechanical properties of the bead. For instance, thepolymer or gel may be of a similar size to the bead. The polymer or gelmay have a mechanical strength (e.g. tensile strength) similar to thatof the bead. The polymer or gel may be of a lower density than an oil.The polymer or gel may be of a density that is roughly similar to thatof a buffer. The polymer or gel may have a tunable pore size. The poresize may be chosen to, for instance, retain denatured nucleic acids. Thepore size may be chosen to maintain diffusive permeability to exogenouschemicals such as sodium hydroxide (NaOH) and/or endogenous chemicalssuch as inhibitors. The polymer or gel may be biocompatible. The polymeror gel may maintain or enhance cell viability. The polymer or gel may bebiochemically compatible. The polymer or gel may be polymerized and/ordepolymerized thermally, chemically, enzymatically, and/or optically.

The polymer may comprise poly(acrylamide-co-acrylic acid) crosslinkedwith disulfide linkages. The preparation of the polymer may comprise atwo-step reaction. In the first activation step,poly(acrylamide-co-acrylic acid) may be exposed to an acylating agent toconvert carboxylic acids to esters. For instance, thepoly(acrylamide-co-acrylic acid) may be exposed to4-(4,6-dimethoxy-1,3,5-triazin-2-yl)-4-methylmorpholinium chloride(DMTMM). The polyacrylamide-co-acrylic acid may be exposed to othersalts of 4-(4,6-dimethoxy-1,3,5-triazin-2-yl)-4-methylmorpholinium. Inthe second cross-linking step, the ester formed in the first step may beexposed to a disulfide crosslinking agent. For instance, the ester maybe exposed to cystamine (2,2′-dithiobis(ethylamine)). Following the twosteps, the cell may be surrounded by polyacrylamide strands linkedtogether by disulfide bridges. In this manner, the cell may be encasedinside of or comprise a gel or matrix (e.g., polymer matrix) to form a“cell bead.” A cell bead can contain a cell or nucleic acids (e.g., RNA,DNA) of individual cells. A cell bead may include a single cell ormultiple cells, or a derivative of the single cell or multiple cells.For example after lysing and washing the cells, inhibitory componentsfrom cell lysates can be washed away and the nucleic acids can be boundas cell beads. Systems and methods disclosed herein can be applicable toboth cell beads (and/or droplets or other partitions) containingindividual cells and cell beads (and/or droplets or other partitions)containing nucleic acids of individual cells.

Encapsulated cells or cell populations can provide certain potentialadvantages of being more storable and more portable than droplet-basedpartitioned cells. Furthermore, in some cases, it may be desirable toallow cells to incubate for a select period of time before analysis,such as in order to characterize changes in such cells over time, eitherin the presence or absence of different stimuli. In such cases,encapsulation may allow for longer incubation than partitioning inemulsion droplets, although in some cases, droplet partitioned cells mayalso be incubated for different periods of time, e.g., at least 10seconds, at least 30 seconds, at least 1 minute, at least 5 minutes, atleast 10 minutes, at least 30 minutes, at least 1 hour, at least 2hours, at least 5 hours, or at least 10 hours or more. The encapsulationof cells may constitute the partitioning of the cells into which otherreagents are co-partitioned. Alternatively or in addition, encapsulatedcells may be readily deposited into other partitions (e.g., droplets) asdescribed above.

Beads

A partition may comprise one or more unique identifiers, such asbarcodes. Barcodes may be previously, subsequently or concurrentlydelivered to the partitions that hold the compartmentalized orpartitioned cell. For example, barcodes may be injected into dropletsprevious to, subsequent to, or concurrently with droplet generation. Thedelivery of the barcodes to a particular partition allows for the laterattribution of the characteristics of the individual cell to theparticular partition. Barcodes may be delivered, for example on anucleic acid molecule (e.g., an oligonucleotide), to a partition via anysuitable mechanism. Barcoded nucleic acid molecules can be delivered toa partition via a microcapsule. A microcapsule, in some instances, cancomprise a bead. Beads are described in further detail below.

In some cases, barcoded nucleic acid molecules can be initiallyassociated with the microcapsule and then released from themicrocapsule. Release of the barcoded nucleic acid molecules can bepassive (e.g., by diffusion out of the microcapsule). In addition oralternatively, release from the microcapsule can be upon application ofa stimulus which allows the barcoded nucleic acid nucleic acid moleculesto dissociate or to be released from the microcapsule. Such stimulus maydisrupt the microcapsule, an interaction that couples the barcodednucleic acid molecules to or within the microcapsule, or both. Suchstimulus can include, for example, a thermal stimulus, photo-stimulus,chemical stimulus (e.g., change in pH or use of a reducing agent(s)), amechanical stimulus, a radiation stimulus; a biological stimulus (e.g.,enzyme), or any combination thereof.

FIG. 2 shows an example of a microfluidic channel structure 200 fordelivering barcode carrying beads to droplets. The channel structure 200can include channel segments 201, 202, 204, 206 and 208 communicating ata channel junction 210. In operation, the channel segment 201 maytransport an aqueous fluid 212 that includes a plurality of beads 214(e.g., with nucleic acid molecules, oligonucleotides, molecular tags)along the channel segment 201 into junction 210. The plurality of beads214 may be sourced from a suspension of beads. For example, the channelsegment 201 may be connected to a reservoir comprising an aqueoussuspension of beads 214. The channel segment 202 may transport theaqueous fluid 212 that includes a plurality of cells 216 along thechannel segment 202 into junction 210. The plurality of cells 216 may besourced from a suspension of cells. For example, the channel segment 202may be connected to a reservoir comprising an aqueous suspension ofcells 216. In some instances, the aqueous fluid 212 in either the firstchannel segment 201 or the second channel segment 202, or in bothsegments, can include one or more reagents, as further described below.A second fluid 218 that is immiscible with the aqueous fluid 212 (e.g.,oil) can be delivered to the junction 210 from each of channel segments204 and 206. Upon meeting of the aqueous fluid 212 from each of channelsegments 201 and 202 and the second fluid 218 from each of channelsegments 204 and 206 at the channel junction 210, the aqueous fluid 212can be partitioned as discrete droplets 220 in the second fluid 218 andflow away from the junction 210 along channel segment 208. The channelsegment 208 may deliver the discrete droplets to an outlet reservoirfluidly coupled to the channel segment 208, where they may be harvested.

As an alternative, the channel segments 201 and 202 may meet at anotherjunction upstream of the junction 210. At such junction, beads and cellsmay form a mixture that is directed along another channel to thejunction 210 to yield droplets 220. The mixture may provide the beadsand cells in an alternating fashion, such that, for example, a dropletcomprises a single bead and a single cell.

Beads, cells and droplets may flow along channels at substantiallyregular flow profiles (e.g., at regular flow rates). Such regular flowprofiles may permit a droplet to include a single bead and a singlecell. Such regular flow profiles may permit the droplets to have anoccupancy (e.g., droplets having beads and cells) greater than 5%, 10%,20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 95%. Such regular flowprofiles and devices that may be used to provide such regular flowprofiles are provided in, for example, U.S. Patent Publication No.2015/0292988, which is entirely incorporated herein by reference.

The second fluid 218 can comprise an oil, such as a fluorinated oil,that includes a fluorosurfactant for stabilizing the resulting droplets,for example, inhibiting subsequent coalescence of the resulting droplets220.

A discrete droplet that is generated may include an individual cell 216.A discrete droplet that is generated may include a barcode or otherreagent carrying bead 214. A discrete droplet generated may include bothan individual cell and a barcode carrying bead, such as droplets 220. Insome instances, a discrete droplet may include more than one individualcell or no cells. In some instances, a discrete droplet may include morethan one bead or no bead. A discrete droplet may be unoccupied (e.g., nobeads, no cells).

Beneficially, a discrete droplet partitioning a cell and a barcodecarrying bead may effectively allow the attribution of the barcode tonucleic acids of the cell within the partition. The contents of apartition may remain discrete from the contents of other partitions.

As will be appreciated, the channel segments described herein may becoupled to any of a variety of different fluid sources or receivingcomponents, including reservoirs, tubing, manifolds, or fluidiccomponents of other systems. As will be appreciated, the microfluidicchannel structure 200 may have other geometries. For example, amicrofluidic channel structure can have more than one channel junctions.For example, a microfluidic channel structure can have 2, 3, 4, or 5channel segments each carrying beads that meet at a channel junction.Fluid may be directed flow along one or more channels or reservoirs viaone or more fluid flow units. A fluid flow unit can comprise compressors(e.g., providing positive pressure), pumps (e.g., providing negativepressure), actuators, and the like to control flow of the fluid. Fluidmay also or otherwise be controlled via applied pressure differentials,centrifugal force, electrokinetic pumping, vacuum, capillary or gravityflow, or the like.

A bead may be porous, non-porous, solid, semi-solid, semi-fluidic,fluidic, and/or a combination thereof. In some instances, a bead may bedissolvable, disruptable, and/or degradable. In some cases, a bead maynot be degradable. In some cases, the bead may be a gel bead. A gel beadmay be a hydrogel bead. A gel bead may be formed from molecularprecursors, such as a polymeric or monomeric species. A semi-solid beadmay be a liposomal bead. Solid beads may comprise metals including ironoxide, gold, and silver. In some cases, the bead may be a silica bead.In some cases, the bead can be rigid. In other cases, the bead may beflexible and/or compressible.

A bead may be of any suitable shape. Examples of bead shapes include,but are not limited to, spherical, non-spherical, oval, oblong,amorphous, circular, cylindrical, and variations thereof.

Beads may be of uniform size or heterogeneous size. In some cases, thediameter of a bead may be at least about 10 nanometers (nm), 100 nm, 500nm, 1 micrometer (μm), 5 μm, 10 μm, 20 μm, 30 μm, 40 μm, 50 μm, 60 μm,70 μm, 80 μm, 90 μm, 100 μm, 250 μm, 500 μm, 1 mm, or greater. In somecases, a bead may have a diameter of less than about 10 nm, 100 nm, 500nm, 5 μm, 10 μm, 20 μm, 30 μm, 40 μm, 50 μm, 60 μm, 70 μm, 80 μm, 90 μm,100 μm, 250 μm, 500 μm, 1 mm, or less. In some cases, a bead may have adiameter in the range of about 40-75 μm, 30-75 μm, 20-75 μm, 40-85 μm,40-95 μm, 20-100 μm, 10-100 μm, 1-100 μm, 20-250 μm, or 20-500 μm.

In certain aspects, beads can be provided as a population or pluralityof beads having a relatively monodisperse size distribution. Where itmay be desirable to provide relatively consistent amounts of reagentswithin partitions, maintaining relatively consistent beadcharacteristics, such as size, can contribute to the overallconsistency. In particular, the beads described herein may have sizedistributions that have a coefficient of variation in theircross-sectional dimensions of less than 50%, less than 40%, less than30%, less than 20%, and in some cases less than 15%, less than 10%, lessthan 5%, or less.

A bead may comprise natural and/or synthetic materials. For example, abead can comprise a natural polymer, a synthetic polymer or both naturaland synthetic polymers. Examples of natural polymers include proteinsand sugars such as deoxyribonucleic acid, rubber, cellulose, starch(e.g., amylose, amylopectin), proteins, enzymes, polysaccharides, silks,polyhydroxyalkanoates, chitosan, dextran, collagen, carrageenan,ispaghula, acacia, agar, gelatin, shellac, sterculia gum, xanthan gum,Corn sugar gum, guar gum, gum karaya, agarose, alginic acid, alginate,or natural polymers thereof. Examples of synthetic polymers includeacrylics, nylons, silicones, spandex, viscose rayon, polycarboxylicacids, polyvinyl acetate, polyacrylamide, polyacrylate, polyethyleneglycol, polyurethanes, polylactic acid, silica, polystyrene,polyacrylonitrile, polybutadiene, polycarbonate, polyethylene,polyethylene terephthalate, poly(chlorotrifluoroethylene), poly(ethyleneoxide), poly(ethylene terephthalate), polyethylene, polyisobutylene,poly(methyl methacrylate), poly(oxymethylene), polyformaldehyde,polypropylene, polystyrene, poly(tetrafluoroethylene), poly(vinylacetate), poly(vinyl alcohol), poly(vinyl chloride), poly(vinylidenedichloride), poly(vinylidene difluoride), poly(vinyl fluoride) and/orcombinations (e.g., co-polymers) thereof. Beads may also be formed frommaterials other than polymers, including lipids, micelles, ceramics,glass-ceramics, material composites, metals, other inorganic materials,and others.

In some instances, the bead may contain molecular precursors (e.g.,monomers or polymers), which may form a polymer network viapolymerization of the molecular precursors. In some cases, a precursormay be an already polymerized species capable of undergoing furtherpolymerization via, for example, a chemical cross-linkage. In somecases, a precursor can comprise one or more of an acrylamide or amethacrylamide monomer, oligomer, or polymer. In some cases, the beadmay comprise prepolymers, which are oligomers capable of furtherpolymerization. For example, polyurethane beads may be prepared usingprepolymers. In some cases, the bead may contain individual polymersthat may be further polymerized together. In some cases, beads may begenerated via polymerization of different precursors, such that theycomprise mixed polymers, co-polymers, and/or block co-polymers. In somecases, the bead may comprise covalent or ionic bonds between polymericprecursors (e.g., monomers, oligomers, linear polymers), nucleic acidmolecules (e.g., oligonucleotides), primers, and other entities. In somecases, the covalent bonds can be carbon-carbon bonds, thioether bonds,or carbon-hetero atom bonds.

Cross-linking may be permanent or reversible, depending upon theparticular cross-linker used. Reversible cross-linking may allow for thepolymer to linearize or dissociate under appropriate conditions. In somecases, reversible cross-linking may also allow for reversible attachmentof a material bound to the surface of a bead. In some cases, across-linker may form disulfide linkages. In some cases, the chemicalcross-linker forming disulfide linkages may be cystamine or a modifiedcystamine.

In some cases, disulfide linkages can be formed between molecularprecursor units (e.g., monomers, oligomers, or linear polymers) orprecursors incorporated into a bead and nucleic acid molecules (e.g.,oligonucleotides). Cystamine (including modified cystamines), forexample, is an organic agent comprising a disulfide bond that may beused as a crosslinker agent between individual monomeric or polymericprecursors of a bead. Polyacrylamide may be polymerized in the presenceof cystamine or a species comprising cystamine (e.g., a modifiedcystamine) to generate polyacrylamide gel beads comprising disulfidelinkages (e.g., chemically degradable beads comprisingchemically-reducible cross-linkers). The disulfide linkages may permitthe bead to be degraded (or dissolved) upon exposure of the bead to areducing agent.

In some cases, chitosan, a linear polysaccharide polymer, may becrosslinked with glutaraldehyde via hydrophilic chains to form a bead.Crosslinking of chitosan polymers may be achieved by chemical reactionsthat are initiated by heat, pressure, change in pH, and/or radiation.

In some cases, a bead may comprise an acrydite moiety, which in certainaspects may be used to attach one or more nucleic acid molecules (e.g.,barcode sequence, barcoded nucleic acid molecule, barcodedoligonucleotide, primer, or other oligonucleotide) to the bead. In somecases, an acrydite moiety can refer to an acrydite analogue generatedfrom the reaction of acrydite with one or more species, such as, thereaction of acrydite with other monomers and cross-linkers during apolymerization reaction. Acrydite moieties may be modified to formchemical bonds with a species to be attached, such as a nucleic acidmolecule (e.g., barcode sequence, barcoded nucleic acid molecule,barcoded oligonucleotide, primer, or other oligonucleotide). Acryditemoieties may be modified with thiol groups capable of forming adisulfide bond or may be modified with groups already comprising adisulfide bond. The thiol or disulfide (via disulfide exchange) may beused as an anchor point for a species to be attached or another part ofthe acrydite moiety may be used for attachment. In some cases,attachment can be reversible, such that when the disulfide bond isbroken (e.g., in the presence of a reducing agent), the attached speciesis released from the bead. In other cases, an acrydite moiety cancomprise a reactive hydroxyl group that may be used for attachment.

Functionalization of beads for attachment of nucleic acid molecules(e.g., oligonucleotides) may be achieved through a wide range ofdifferent approaches, including activation of chemical groups within apolymer, incorporation of active or activatable functional groups in thepolymer structure, or attachment at the pre-polymer or monomer stage inbead production.

For example, precursors (e.g., monomers, cross-linkers) that arepolymerized to form a bead may comprise acrydite moieties, such thatwhen a bead is generated, the bead also comprises acrydite moieties. Theacrydite moieties can be attached to a nucleic acid molecule (e.g.,oligonucleotide), which may include a priming sequence (e.g., a primerfor amplifying target nucleic acids, random primer, primer sequence formessenger RNA) and/or one or more barcode sequences. The one morebarcode sequences may include sequences that are the same for allnucleic acid molecules coupled to a given bead and/or sequences that aredifferent across all nucleic acid molecules coupled to the given bead.The nucleic acid molecule may be incorporated into the bead.

In some cases, the nucleic acid molecule can comprise a functionalsequence, for example, for attachment to a sequencing flow cell, suchas, for example, a P5 sequence for Illumina® sequencing. In some cases,the nucleic acid molecule or derivative thereof (e.g., oligonucleotideor polynucleotide generated from the nucleic acid molecule) can compriseanother functional sequence, such as, for example, a P7 sequence forattachment to a sequencing flow cell for Illumina sequencing. In somecases, the nucleic acid molecule can comprise a barcode sequence. Insome cases, the primer can further comprise a unique molecularidentifier (UMI). In some cases, the primer can comprise an R1 primersequence for Illumina sequencing. In some cases, the primer can comprisean R2 primer sequence for Illumina sequencing. Examples of such nucleicacid molecules (e.g., oligonucleotides, polynucleotides, etc.) and usesthereof, as may be used with compositions, devices, methods and systemsof the present disclosure, are provided in U.S. Patent Pub. Nos.2014/0378345 and 2015/0376609, each of which is entirely incorporatedherein by reference.

In operation, a cell can be co-partitioned along with a barcode bearingbead. The barcoded nucleic acid molecules can be released from the beadin the partition. By way of example, in the context of analyzing sampleRNA, the poly-dT (poly-deoxythymine, also referred to as oligo (dT))segment of one of the released nucleic acid molecules can hybridize tothe poly-A tail of a mRNA molecule. Reverse transcription may result ina cDNA transcript of the mRNA, but which transcript includes each of thesequence segments of the nucleic acid molecule. Because the nucleic acidmolecule comprises an anchoring sequence, it will more likely hybridizeto and prime reverse transcription at the sequence end of the poly-Atail of the mRNA. Within any given partition, all of the cDNAtranscripts of the individual mRNA molecules may include a commonbarcode sequence segment. However, the transcripts made from thedifferent mRNA molecules within a given partition may vary at the uniquemolecular identifying sequence segment (e.g., UMI segment).Beneficially, even following any subsequent amplification of thecontents of a given partition, the number of different UMIs can beindicative of the quantity of mRNA originating from a given partition,and thus from the cell. As noted above, the transcripts can beamplified, cleaned up and sequenced to identify the sequence of the cDNAtranscript of the mRNA, as well as to sequence the barcode segment andthe UMI segment. While a poly-dT primer sequence is described, othertargeted or random priming sequences may also be used in priming thereverse transcription reaction. Likewise, although described asreleasing the barcoded oligonucleotides into the partition, in somecases, the nucleic acid molecules bound to the bead (e.g., gel bead) maybe used to hybridize and capture the mRNA on the solid phase of thebead, for example, in order to facilitate the separation of the RNA fromother cell contents.

In some cases, precursors comprising a functional group that is reactiveor capable of being activated such that it becomes reactive can bepolymerized with other precursors to generate gel beads comprising theactivated or activatable functional group. The functional group may thenbe used to attach additional species (e.g., disulfide linkers, primers,other oligonucleotides, etc.) to the gel beads. For example, someprecursors comprising a carboxylic acid (COOH) group can co-polymerizewith other precursors to form a gel bead that also comprises a COOHfunctional group. In some cases, acrylic acid (a species comprising freeCOOH groups), acrylamide, and bis(acryloyl)cystamine can beco-polymerized together to generate a gel bead comprising free COOHgroups. The COOH groups of the gel bead can be activated (e.g., via1-Ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC) andN-Hydroxysuccinimide (NHS) or4-(4,6-Dimethoxy-1,3,5-triazin-2-yl)-4-methylmorpholinium chloride(DMTMM)) such that they are reactive (e.g., reactive to amine functionalgroups where EDC/NHS or DMTMM are used for activation). The activatedCOOH groups can then react with an appropriate species (e.g., a speciescomprising an amine functional group where the carboxylic acid groupsare activated to be reactive with an amine functional group) comprisinga moiety to be linked to the bead.

Beads comprising disulfide linkages in their polymeric network may befunctionalized with additional species via reduction of some of thedisulfide linkages to free thiols. The disulfide linkages may be reducedvia, for example, the action of a reducing agent (e.g., DTT, TCEP, etc.)to generate free thiol groups, without dissolution of the bead. Freethiols of the beads can then react with free thiols of a species or aspecies comprising another disulfide bond (e.g., via thiol-disulfideexchange) such that the species can be linked to the beads (e.g., via agenerated disulfide bond). In some cases, free thiols of the beads mayreact with any other suitable group. For example, free thiols of thebeads may react with species comprising an acrydite moiety. The freethiol groups of the beads can react with the acrydite via Michaeladdition chemistry, such that the species comprising the acrydite islinked to the bead. In some cases, uncontrolled reactions can beprevented by inclusion of a thiol capping agent such asN-ethylmalieamide or iodoacetate.

Activation of disulfide linkages within a bead can be controlled suchthat only a small number of disulfide linkages are activated. Controlmay be exerted, for example, by controlling the concentration of areducing agent used to generate free thiol groups and/or concentrationof reagents used to form disulfide bonds in bead polymerization. In somecases, a low concentration (e.g., molecules of reducing agent:gel beadratios of less than or equal to about 1:100,000,000,000, less than orequal to about 1:10,000,000,000, less than or equal to about1:1,000,000,000, less than or equal to about 1:100,000,000, less than orequal to about 1:10,000,000, less than or equal to about 1:1,000,000,less than or equal to about 1:100,000, less than or equal to about1:10,000) of reducing agent may be used for reduction. Controlling thenumber of disulfide linkages that are reduced to free thiols may beuseful in ensuring bead structural integrity during functionalization.In some cases, optically-active agents, such as fluorescent dyes may becoupled to beads via free thiol groups of the beads and used to quantifythe number of free thiols present in a bead and/or track a bead.

In some cases, addition of moieties to a gel bead after gel beadformation may be advantageous. For example, addition of anoligonucleotide (e.g., barcoded oligonucleotide) after gel beadformation may avoid loss of the species during chain transfertermination that can occur during polymerization. Moreover, smallerprecursors (e.g., monomers or cross linkers that do not comprise sidechain groups and linked moieties) may be used for polymerization and canbe minimally hindered from growing chain ends due to viscous effects. Insome cases, functionalization after gel bead synthesis can minimizeexposure of species (e.g., oligonucleotides) to be loaded withpotentially damaging agents (e.g., free radicals) and/or chemicalenvironments. In some cases, the generated gel may possess an uppercritical solution temperature (UCST) that can permit temperature drivenswelling and collapse of a bead. Such functionality may aid inoligonucleotide (e.g., a primer) infiltration into the bead duringsubsequent functionalization of the bead with the oligonucleotide.Post-production functionalization may also be useful in controllingloading ratios of species in beads, such that, for example, thevariability in loading ratio is minimized Species loading may also beperformed in a batch process such that a plurality of beads can befunctionalized with the species in a single batch.

A bead injected or otherwise introduced into a partition may comprisereleasably, cleavably, or reversibly attached barcodes. A bead injectedor otherwise introduced into a partition may comprise activatablebarcodes. A bead injected or otherwise introduced into a partition maybe degradable, disruptable, or dissolvable beads.

Barcodes can be releasably, cleavably or reversibly attached to thebeads such that barcodes can be released or be releasable throughcleavage of a linkage between the barcode molecule and the bead, orreleased through degradation of the underlying bead itself, allowing thebarcodes to be accessed or be accessible by other reagents, or both. Innon-limiting examples, cleavage may be achieved through reduction ofdi-sulfide bonds, use of restriction enzymes, photo-activated cleavage,or cleavage via other types of stimuli (e.g., chemical, thermal, pH,enzymatic, etc.) and/or reactions, such as described elsewhere herein.Releasable barcodes may sometimes be referred to as being activatable,in that they are available for reaction once released. Thus, forexample, an activatable barcode may be activated by releasing thebarcode from a bead (or other suitable type of partition describedherein). Other activatable configurations are also envisioned in thecontext of the described methods and systems.

In addition to, or as an alternative to the cleavable linkages betweenthe beads and the associated molecules, such as barcode containingnucleic acid molecules (e.g., barcoded oligonucleotides), the beads maybe degradable, disruptable, or dissolvable spontaneously or uponexposure to one or more stimuli (e.g., temperature changes, pH changes,exposure to particular chemical species or phase, exposure to light,reducing agent, etc.). In some cases, a bead may be dissolvable, suchthat material components of the beads are solubilized when exposed to aparticular chemical species or an environmental change, such as a changetemperature or a change in pH. In some cases, a gel bead can be degradedor dissolved at elevated temperature and/or in basic conditions. In somecases, a bead may be thermally degradable such that when the bead isexposed to an appropriate change in temperature (e.g., heat), the beaddegrades. Degradation or dissolution of a bead bound to a species (e.g.,a nucleic acid molecule, e.g., barcoded oligonucleotide) may result inrelease of the species from the bead.

As will be appreciated from the above disclosure, the degradation of abead may refer to the disassociation of a bound or entrained speciesfrom a bead, both with and without structurally degrading the physicalbead itself. For example, the degradation of the bead may involvecleavage of a cleavable linkage via one or more species and/or methodsdescribed elsewhere herein. In another example, entrained species may bereleased from beads through osmotic pressure differences due to, forexample, changing chemical environments. By way of example, alterationof bead pore sizes due to osmotic pressure differences can generallyoccur without structural degradation of the bead itself. In some cases,an increase in pore size due to osmotic swelling of a bead can permitthe release of entrained species within the bead. In other cases,osmotic shrinking of a bead may cause a bead to better retain anentrained species due to pore size contraction.

A degradable bead may be introduced into a partition, such as a dropletof an emulsion or a well, such that the bead degrades within thepartition and any associated species (e.g., oligonucleotides) arereleased within the droplet when the appropriate stimulus is applied.The free species (e.g., oligonucleotides, nucleic acid molecules) mayinteract with other reagents contained in the partition. For example, apolyacrylamide bead comprising cystamine and linked, via a disulfidebond, to a barcode sequence, may be combined with a reducing agentwithin a droplet of a water-in-oil emulsion. Within the droplet, thereducing agent can break the various disulfide bonds, resulting in beaddegradation and release of the barcode sequence into the aqueous, innerenvironment of the droplet. In another example, heating of a dropletcomprising a bead-bound barcode sequence in basic solution may alsoresult in bead degradation and release of the attached barcode sequenceinto the aqueous, inner environment of the droplet.

Any suitable number of molecular tag molecules (e.g., primer, barcodedoligonucleotide) can be associated with a bead such that, upon releasefrom the bead, the molecular tag molecules (e.g., primer, e.g., barcodedoligonucleotide) are present in the partition at a pre-definedconcentration. Such pre-defined concentration may be selected tofacilitate certain reactions for generating a sequencing library, e.g.,amplification, within the partition. In some cases, the pre-definedconcentration of the primer can be limited by the process of producingnucleic acid molecule (e.g., oligonucleotide) bearing beads.

In some cases, beads can be non-covalently loaded with one or morereagents. The beads can be non-covalently loaded by, for instance,subjecting the beads to conditions sufficient to swell the beads,allowing sufficient time for the reagents to diffuse into the interiorsof the beads, and subjecting the beads to conditions sufficient tode-swell the beads. The swelling of the beads may be accomplished, forinstance, by placing the beads in a thermodynamically favorable solvent,subjecting the beads to a higher or lower temperature, subjecting thebeads to a higher or lower ion concentration, and/or subjecting thebeads to an electric field. The swelling of the beads may beaccomplished by various swelling methods. The de-swelling of the beadsmay be accomplished, for instance, by transferring the beads in athermodynamically unfavorable solvent, subjecting the beads to lower orhigh temperatures, subjecting the beads to a lower or higher ionconcentration, and/or removing an electric field. The de-swelling of thebeads may be accomplished by various de-swelling methods. Transferringthe beads may cause pores in the bead to shrink. The shrinking may thenhinder reagents within the beads from diffusing out of the interiors ofthe beads. The hindrance may be due to steric interactions between thereagents and the interiors of the beads. The transfer may beaccomplished microfluidically. For instance, the transfer may beachieved by moving the beads from one co-flowing solvent stream to adifferent co-flowing solvent stream. The swellability and/or pore sizeof the beads may be adjusted by changing the polymer composition of thebead.

In some cases, an acrydite moiety linked to a precursor, another specieslinked to a precursor, or a precursor itself can comprise a labile bond,such as chemically, thermally, or photo-sensitive bond e.g., disulfidebond, UV sensitive bond, or the like. Once acrydite moieties or othermoieties comprising a labile bond are incorporated into a bead, the beadmay also comprise the labile bond. The labile bond may be, for example,useful in reversibly linking (e.g., covalently linking) species (e.g.,barcodes, primers, etc.) to a bead. In some cases, a thermally labilebond may include a nucleic acid hybridization based attachment, e.g.,where an oligonucleotide is hybridized to a complementary sequence thatis attached to the bead, such that thermal melting of the hybridreleases the oligonucleotide, e.g., a barcode containing sequence, fromthe bead or microcapsule.

The addition of multiple types of labile bonds to a gel bead may resultin the generation of a bead capable of responding to varied stimuli.Each type of labile bond may be sensitive to an associated stimulus(e.g., chemical stimulus, light, temperature, enzymatic, etc.) such thatrelease of species attached to a bead via each labile bond may becontrolled by the application of the appropriate stimulus. Suchfunctionality may be useful in controlled release of species from a gelbead. In some cases, another species comprising a labile bond may belinked to a gel bead after gel bead formation via, for example, anactivated functional group of the gel bead as described above. As willbe appreciated, barcodes that are releasably, cleavably or reversiblyattached to the beads described herein include barcodes that arereleased or releasable through cleavage of a linkage between the barcodemolecule and the bead, or that are released through degradation of theunderlying bead itself, allowing the barcodes to be accessed oraccessible by other reagents, or both.

In addition to thermally cleavable bonds, disulfide bonds and UVsensitive bonds, other non-limiting examples of labile bonds that may becoupled to a precursor or bead include an ester linkage (e.g., cleavablewith an acid, a base, or hydroxylamine), a vicinal diol linkage (e.g.,cleavable via sodium periodate), a Diels-Alder linkage (e.g., cleavablevia heat), a sulfone linkage (e.g., cleavable via a base), a silyl etherlinkage (e.g., cleavable via an acid), a glycosidic linkage (e.g.,cleavable via an amylase), a peptide linkage (e.g., cleavable via aprotease), or a phosphodiester linkage (e.g., cleavable via a nuclease(e.g., DNAase)). A bond may be cleavable via other nucleic acid moleculetargeting enzymes, such as restriction enzymes (e.g., restrictionendonucleases), as described further below.

Species may be encapsulated in beads during bead generation (e.g.,during polymerization of precursors). Such species may or may notparticipate in polymerization. Such species may be entered intopolymerization reaction mixtures such that generated beads comprise thespecies upon bead formation. In some cases, such species may be added tothe gel beads after formation. Such species may include, for example,nucleic acid molecules (e.g., oligonucleotides), reagents for a nucleicacid amplification reaction (e.g., primers, polymerases, dNTPs,co-factors (e.g., ionic co-factors), buffers) including those describedherein, reagents for enzymatic reactions (e.g., enzymes, co-factors,substrates, buffers), reagents for nucleic acid modification reactionssuch as polymerization, ligation, or digestion, and/or reagents fortemplate preparation (e.g., tagmentation) for one or more sequencingplatforms (e.g., Nextera® for Illumina®). Such species may include oneor more enzymes described herein, including without limitation,polymerase, reverse transcriptase, restriction enzymes (e.g.,endonuclease), transposase, ligase, proteinase K, DNAse, etc. Suchspecies may include one or more reagents described elsewhere herein(e.g., lysis agents, inhibitors, inactivating agents, chelating agents,stimulus). Trapping of such species may be controlled by the polymernetwork density generated during polymerization of precursors, controlof ionic charge within the gel bead (e.g., via ionic species linked topolymerized species), or by the release of other species. Encapsulatedspecies may be released from a bead upon bead degradation and/or byapplication of a stimulus capable of releasing the species from thebead. Alternatively or in addition, species may be partitioned in apartition (e.g., droplet) during or subsequent to partition formation.Such species may include, without limitation, the abovementioned speciesthat may also be encapsulated in a bead.

A degradable bead may comprise one or more species with a labile bondsuch that, when the bead/species is exposed to the appropriate stimuli,the bond is broken and the bead degrades. The labile bond may be achemical bond (e.g., covalent bond, ionic bond) or may be another typeof physical interaction (e.g., van der Waals interactions, dipole-dipoleinteractions, etc.). In some cases, a crosslinker used to generate abead may comprise a labile bond. Upon exposure to the appropriateconditions, the labile bond can be broken and the bead degraded. Forexample, upon exposure of a polyacrylamide gel bead comprising cystaminecrosslinkers to a reducing agent, the disulfide bonds of the cystaminecan be broken and the bead degraded.

A degradable bead may be useful in more quickly releasing an attachedspecies (e.g., a nucleic acid molecule, a barcode sequence, a primer,etc) from the bead when the appropriate stimulus is applied to the beadas compared to a bead that does not degrade. For example, for a speciesbound to an inner surface of a porous bead or in the case of anencapsulated species, the species may have greater mobility andaccessibility to other species in solution upon degradation of the bead.In some cases, a species may also be attached to a degradable bead via adegradable linker (e.g., disulfide linker). The degradable linker mayrespond to the same stimuli as the degradable bead or the two degradablespecies may respond to different stimuli. For example, a barcodesequence may be attached, via a disulfide bond, to a polyacrylamide beadcomprising cystamine Upon exposure of the barcoded-bead to a reducingagent, the bead degrades and the barcode sequence is released uponbreakage of both the disulfide linkage between the barcode sequence andthe bead and the disulfide linkages of the cystamine in the bead.

Where degradable beads are provided, it may be beneficial to avoidexposing such beads to the stimulus or stimuli that cause suchdegradation prior to a given time, in order to, for example, avoidpremature bead degradation and issues that arise from such degradation,including for example poor flow characteristics and aggregation. By wayof example, where beads comprise reducible cross-linking groups, such asdisulfide groups, it will be desirable to avoid contacting such beadswith reducing agents, e.g., DTT or other disulfide cleaving reagents. Insuch cases, treatment to the beads described herein will, in some casesbe provided free of reducing agents, such as DTT. Because reducingagents are often provided in commercial enzyme preparations, it may bedesirable to provide reducing agent free (or DTT free) enzymepreparations in treating the beads described herein. Examples of suchenzymes include, e.g., polymerase enzyme preparations, reversetranscriptase enzyme preparations, ligase enzyme preparations, as wellas many other enzyme preparations that may be used to treat the beadsdescribed herein. The terms “reducing agent free” or “DTT free”preparations can refer to a preparation having less than about 1/10th,less than about 1/50th, or even less than about 1/100th of the lowerranges for such materials used in degrading the beads. For example, forDTT, the reducing agent free preparation can have less than about 0.01millimolar (mM), 0.005 mM, 0.001 mM DTT, 0.0005 mM DTT, or even lessthan about 0.0001 mM DTT. In many cases, the amount of DTT can beundetectable.

Numerous chemical triggers may be used to trigger the degradation ofbeads. Examples of these chemical changes may include, but are notlimited to pH-mediated changes to the integrity of a component withinthe bead, degradation of a component of a bead via cleavage ofcross-linked bonds, and depolymerization of a component of a bead.

In some embodiments, a bead may be formed from materials that comprisedegradable chemical crosslinkers, such as BAC or cystamine. Degradationof such degradable crosslinkers may be accomplished through a number ofmechanisms. In some examples, a bead may be contacted with a chemicaldegrading agent that may induce oxidation, reduction or other chemicalchanges. For example, a chemical degrading agent may be a reducingagent, such as dithiothreitol (DTT). Additional examples of reducingagents may include β-mercaptoethanol, (2S)-2-amino-1,4-dimercaptobutane(dithiobutylamine or DTBA), tris(2-carboxyethyl) phosphine (TCEP), orcombinations thereof. A reducing agent may degrade the disulfide bondsformed between gel precursors forming the bead, and thus, degrade thebead. In other cases, a change in pH of a solution, such as an increasein pH, may trigger degradation of a bead. In other cases, exposure to anaqueous solution, such as water, may trigger hydrolytic degradation, andthus degradation of the bead. In some cases, any combination of stimulimay trigger degradation of a bead. For example, a change in pH mayenable a chemical agent (e.g., DTT) to become an effective reducingagent.

Beads may also be induced to release their contents upon the applicationof a thermal stimulus. A change in temperature can cause a variety ofchanges to a bead. For example, heat can cause a solid bead to liquefy.A change in heat may cause melting of a bead such that a portion of thebead degrades. In other cases, heat may increase the internal pressureof the bead components such that the bead ruptures or explodes. Heat mayalso act upon heat-sensitive polymers used as materials to constructbeads.

Any suitable agent may degrade beads. In some embodiments, changes intemperature or pH may be used to degrade thermo-sensitive orpH-sensitive bonds within beads. In some embodiments, chemical degradingagents may be used to degrade chemical bonds within beads by oxidation,reduction or other chemical changes. For example, a chemical degradingagent may be a reducing agent, such as DTT, wherein DTT may degrade thedisulfide bonds formed between a crosslinker and gel precursors, thusdegrading the bead. In some embodiments, a reducing agent may be addedto degrade the bead, which may or may not cause the bead to release itscontents. Examples of reducing agents may include dithiothreitol (DTT),β-mercaptoethanol, (2S)-2-amino-1,4-dimercaptobutane (dithiobutylamineor DTBA), tris(2-carboxyethyl) phosphine (TCEP), or combinationsthereof. The reducing agent may be present at a concentration of about0.1 mM, 0.5 mM, 1 mM, 5 mM, 10 mM. The reducing agent may be present ata concentration of at least about 0.1 mM, 0.5 mM, 1 mM, 5 mM, 10 mM, orgreater than 10 mM. The reducing agent may be present at concentrationof at most about 10 mM, 5 mM, 1 mM, 0.5 mM, 0.1 mM, or less.

Although FIG. 1 and FIG. 2 have been described in terms of providingsubstantially singly occupied partitions, above, in certain cases, itmay be desirable to provide multiply occupied partitions, e.g.,containing two, three, four or more cells and/or microcapsules (e.g.,beads) comprising barcoded nucleic acid molecules (e.g.,oligonucleotides) within a single partition. Accordingly, as notedabove, the flow characteristics of the cell and/or bead containingfluids and partitioning fluids may be controlled to provide for suchmultiply occupied partitions. In particular, the flow parameters may becontrolled to provide a given occupancy rate at greater than about 50%of the partitions, greater than about 75%, and in some cases greaterthan about 80%, 90%, 95%, or higher.

In some cases, additional microcapsules can be used to deliveradditional reagents to a partition. In such cases, it may beadvantageous to introduce different beads into a common channel ordroplet generation junction, from different bead sources (e.g.,containing different associated reagents) through different channelinlets into such common channel or droplet generation junction (e.g.,junction 210). In such cases, the flow and frequency of the differentbeads into the channel or junction may be controlled to provide for acertain ratio of microcapsules from each source, while ensuring a givenpairing or combination of such beads into a partition with a givennumber of cells (e.g., one cell and one bead per partition).

The partitions described herein may comprise small volumes, for example,less than about 10 microliters (μL), 5 μL, 1 μL, 900 picoliters (pL),800 pL, 700 pL, 600 pL, 500 pL, 400 pL, 300 pL, 200 pL, 100 pL, 50 pL,20 pL, 10 pL, 1 pL, 500 nanoliters (nL), 100 nL, 50 nL, or less.

For example, in the case of droplet based partitions, the droplets mayhave overall volumes that are less than about 1000 pL, 900 pL, 800 pL,700 pL, 600 pL, 500 pL, 400 pL, 300 pL, 200 pL, 100 pL, 50 pL, 20 pL, 10pL, 1 pL, or less. Where co-partitioned with microcapsules, it will beappreciated that the sample fluid volume, e.g., including co-partitionedbiological particles and/or beads, within the partitions may be lessthan about 90% of the above described volumes, less than about 80%, lessthan about 70%, less than about 60%, less than about 50%, less thanabout 40%, less than about 30%, less than about 20%, or less than about10% of the above described volumes.

As is described elsewhere herein, partitioning species may generate apopulation or plurality of partitions. In such cases, any suitablenumber of partitions can be generated or otherwise provided. Forexample, at least about 1,000 partitions, at least about 5,000partitions, at least about 10,000 partitions, at least about 50,000partitions, at least about 100,000 partitions, at least about 500,000partitions, at least about 1,000,000 partitions, at least about5,000,000 partitions at least about 10,000,000 partitions, at leastabout 50,000,000 partitions, at least about 100,000,000 partitions, atleast about 500,000,000 partitions, at least about 1,000,000,000partitions, or more partitions can be generated or otherwise provided.Moreover, the plurality of partitions may comprise both unoccupiedpartitions (e.g., empty partitions) and occupied partitions.

Reagents

In accordance with certain aspects, the cells may be partitioned alongwith lysis reagents in order to release the contents of the cells withinthe partition. In such cases, the lysis agents can be contacted with thecell suspension concurrently with, or immediately prior to theintroduction of the cells into the partitioning junction/dropletgeneration zone (e.g., junction 210), such as through an additionalchannel or channels upstream of the channel junction. Beneficially, whenlysis reagents and biological particles are co-partitioned, the lysisreagents can facilitate the release of the contents of the biologicalparticles within the partition. The contents released in a partition mayremain discrete from the contents of other partitions.

Examples of lysis agents include bioactive reagents, such as lysisenzymes that are used for lysis of different cell types, e.g., grampositive or negative bacteria, plants, yeast, mammalian, etc., such aslysozymes, achromopeptidase, lysostaphin, labiase, kitalase, lyticase,and a variety of other lysis enzymes available from, e.g.,Sigma-Aldrich, Inc. (St Louis, Mo.), as well as other commerciallyavailable lysis enzymes. Other lysis agents may additionally oralternatively be co-partitioned with the cells to cause the release ofthe cell's contents into the partitions. For example, in some cases,surfactant-based lysis solutions may be used to lyse cells, althoughthese may be less desirable for emulsion based systems where thesurfactants can interfere with stable emulsions. In some cases, lysissolutions may include non-ionic surfactants such as, for example,TritonX-100 and Tween 20. In some cases, lysis solutions may includeionic surfactants such as, for example, sarcosyl and sodium dodecylsulfate (SDS). Electroporation, thermal, acoustic or mechanical cellulardisruption may also be used in certain cases, e.g., non-emulsion basedpartitioning such as encapsulation of cells that may be in addition toor in place of droplet partitioning, where any pore size of theencapsulate is sufficiently small to retain nucleic acid fragments of agiven size, following cellular disruption.

Alternatively or in addition to the lysis agents co-partitioned with thecells described above, other reagents can also be co-partitioned withthe cells, including, for example, DNase and RNase inactivating agentsor inhibitors, such as proteinase K, chelating agents, such as EDTA, andother reagents employed in removing or otherwise reducing negativeactivity or impact of different cell lysate components on subsequentprocessing of nucleic acids. In addition, in the case of encapsulatedcells, the cells may be exposed to an appropriate stimulus to releasethe cells or their contents from a co-partitioned microcapsule. Forexample, in some cases, a chemical stimulus may be co-partitioned alongwith an encapsulated cell to allow for the degradation of themicrocapsule and release of the cell or its contents into the largerpartition. In some cases, this stimulus may be the same as the stimulusdescribed elsewhere herein for release of nucleic acid molecules (e.g.,oligonucleotides) from their respective microcapsule (e.g., bead). Inalternative aspects, this may be a different and non-overlappingstimulus, in order to allow an encapsulated cell to be released into apartition at a different time from the release of nucleic acid moleculesinto the same partition.

As will be appreciated, a number of other reagents may be co-partitionedalong with the cells, beads, lysis agents and chemical stimuli,including, for example, protective reagents, like proteinase K,chelators, nucleic acid extension, replication, transcription oramplification reagents such as polymerases, reverse transcriptases,transposases which can be used for transposon based methods (e.g.,Nextera), nucleoside triphosphates or NTP analogues, primer sequencesand additional cofactors such as divalent metal ions used in suchreactions, ligation reaction reagents, such as ligase enzymes andligation sequences, dyes, labels, or other tagging reagents.

Additional reagents may also be co-partitioned with the cells, such asendonucleases to fragment a cell's DNA, DNA polymerase enzymes and dNTPsused to amplify the cell's nucleic acid fragments and to attach thebarcode molecular tags to the amplified fragments. Other enzymes may beco-partitioned, including without limitation, polymerase, transposase,ligase, proteinase K, DNAse, etc. Additional reagents may also includereverse transcriptase enzymes, including enzymes with terminaltransferase activity, primers and oligonucleotides, and switcholigonucleotides (also referred to herein as “switch oligos” or“template switching oligonucleotides”) which can be used for templateswitching. In some cases, template switching can be used to increase thelength of a cDNA. In some cases, template switching can be used toappend a predefined nucleic acid sequence to the cDNA. In an example oftemplate switching, cDNA can be generated from reverse transcription ofa template, e.g., cellular mRNA, where a reverse transcriptase withterminal transferase activity can add additional nucleotides, e.g.,polyC, to the cDNA in a template independent manner Switch oligos caninclude sequences complementary to the additional nucleotides, e.g.,polyG. The additional nucleotides (e.g., polyC) on the cDNA canhybridize to the additional nucleotides (e.g., polyG) on the switcholigo, whereby the switch oligo can be used by the reverse transcriptaseas template to further extend the cDNA. Template switchingoligonucleotides may comprise a hybridization region and a templateregion. The hybridization region can comprise any sequence capable ofhybridizing to the target. In some cases, as previously described, thehybridization region comprises a series of G bases to complement theoverhanging C bases at the 3′ end of a cDNA molecule. The series of Gbases may comprise 1 G base, 2 G bases, 3 G bases, 4 G bases, 5 G basesor more than 5 G bases. The template sequence can comprise any sequenceto be incorporated into the cDNA. In some cases, the template regioncomprises at least 1 (e.g., at least 2, 3, 4, 5 or more) tag sequencesand/or functional sequences. Switch oligos may comprise deoxyribonucleicacids; ribonucleic acids; modified nucleic acids including2-Aminopurine, 2,6-Diaminopurine (2-Amino-dA), inverted dT, 5-Methyl dC,2′-deoxylnosine, Super T (5-hydroxybutynl-2′-deoxyuridine), Super G(8-aza-7-deazaguanosine), locked nucleic acids (LNAs), unlocked nucleicacids (UNAs, e.g., UNA-A, UNA-U, UNA-C, UNA-G), Iso-dG, Iso-dC, 2′Fluoro bases (e.g., Fluoro C, Fluoro U, Fluoro A, and Fluoro G), or anycombination.

In some cases, the length of a switch oligo may be 2, 3, 4, 5, 6, 7, 8,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62,63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126,127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140,141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154,155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182,183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196,197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210,211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238,239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250 nucleotidesor longer.

In some cases, the length of a switch oligo may be at least about 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94,95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109,110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123,124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137,138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151,152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165,166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193,194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207,208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221,222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235,236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249 or250 nucleotides or longer.

In some cases, the length of a switch oligo may be at most about 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94,95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109,110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123,124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137,138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151,152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165,166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193,194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207,208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221,222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235,236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249 or250 nucleotides.

Once the contents of the cells are released into their respectivepartitions, the nucleic acids contained therein may be further processedwithin the partitions. In accordance with the methods and systemsdescribed herein, the nucleic acid contents of individual cells can beprovided with unique identifiers such that, upon characterization ofthose nucleic acids they may be attributed as having been derived fromthe same cell or cells. The ability to attribute characteristics toindividual cells or groups of cells is provided by the assignment ofunique identifiers specifically to an individual cell or groups ofcells, which is another advantageous aspect of the methods and systemsdescribed herein. In particular, unique identifiers, e.g., in the formof nucleic acid barcodes are assigned or associated with individualcells or populations of cells, in order to tag or label the cell'scomponents (and as a result, its characteristics) with the uniqueidentifiers. These unique identifiers are then used to attribute thecell's components and characteristics to an individual cell or group ofcells.

In some aspects, this is carried out by co-partitioning the individualcells or groups of cells with the unique identifiers, such as describedabove (with reference to FIG. 2). In some aspects, the uniqueidentifiers are provided in the form of nucleic acid molecules (e.g.,oligonucleotides) that comprise nucleic acid barcode sequences that maybe attached to or otherwise associated with the nucleic acid contents ofindividual cell, or to other components of the cell, and particularly tofragments of those nucleic acids. The nucleic acid molecules arepartitioned such that as between nucleic acid molecules in a givenpartition, the nucleic acid barcode sequences contained therein are thesame, but as between different partitions, the nucleic acid moleculecan, and do have differing barcode sequences, or at least represent alarge number of different barcode sequences across all of the partitionsin a given analysis. In some aspects, only one nucleic acid barcodesequence can be associated with a given partition, although in somecases, two or more different barcode sequences may be present.

The nucleic acid barcode sequences can include from about 6 to about 20or more nucleotides within the sequence of the nucleic acid molecules(e.g., oligonucleotides). The nucleic acid barcode sequences can includefrom about 6 to about 20, 30, 40, 50, 60, 70, 80, 90, 100 or morenucleotides. In some cases, the length of a barcode sequence may beabout 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 nucleotidesor longer. In some cases, the length of a barcode sequence may be atleast about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20nucleotides or longer. In some cases, the length of a barcode sequencemay be at most about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,20 nucleotides or shorter. These nucleotides may be completelycontiguous, i.e., in a single stretch of adjacent nucleotides, or theymay be separated into two or more separate subsequences that areseparated by 1 or more nucleotides. In some cases, separated barcodesubsequences can be from about 4 to about 16 nucleotides in length. Insome cases, the barcode subsequence may be about 4, 5, 6, 7, 8, 9, 10,11, 12, 13, 14, 15, 16 nucleotides or longer. In some cases, the barcodesubsequence may be at least about 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,15, 16 nucleotides or longer. In some cases, the barcode subsequence maybe at most about 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16nucleotides or shorter.

The co-partitioned oligonucleotides can also comprise other functionalsequences useful in the processing of the nucleic acids from theco-partitioned cells. These sequences include, e.g., targeted orrandom/universal amplification primer sequences for amplifying thegenomic DNA from the individual cells within the partitions whileattaching the associated barcode sequences, sequencing primers or primerrecognition sites, hybridization or probing sequences, e.g., foridentification of presence of the sequences or for pulling down barcodednucleic acids, or any of a number of other potential functionalsequences. Other mechanisms of co-partitioning oligonucleotides may alsobe employed, including, e.g., coalescence of two or more droplets, whereone droplet contains oligonucleotides, or microdispensing ofoligonucleotides into partitions, e.g., droplets within microfluidicsystems.

In an example, microcapsules, such as beads, are provided that eachinclude large numbers of the above described barcoded nucleic acidmolecules (e.g., barcoded oligonucleotides) releasably attached to thebeads, where all of the nucleic acid molecules attached to a particularbead will include the same nucleic acid barcode sequence, but where alarge number of diverse barcode sequences are represented across thepopulation of beads used. In some embodiments, hydrogel beads, e.g.,comprising polyacrylamide polymer matrices, are used as a solid supportand delivery vehicle for the nucleic acid molecules into the partitions,as they are capable of carrying large numbers of nucleic acid molecules,and may be configured to release those nucleic acid molecules uponexposure to a particular stimulus, as described elsewhere herein. Insome cases, the population of beads provides a diverse barcode sequencelibrary that includes at least about 1,000 different barcode sequences,at least about 5,000 different barcode sequences, at least about 10,000different barcode sequences, at least about 50,000 different barcodesequences, at least about 100,000 different barcode sequences, at leastabout 1,000,000 different barcode sequences, at least about 5,000,000different barcode sequences, or at least about 10,000,000 differentbarcode sequences, or more. Additionally, each bead can be provided withlarge numbers of nucleic acid (e.g., oligonucleotide) moleculesattached. In particular, the number of molecules of nucleic acidmolecules including the barcode sequence on an individual bead can be atleast about 1,000 nucleic acid molecules, at least about 5,000 nucleicacid molecules, at least about 10,000 nucleic acid molecules, at leastabout 50,000 nucleic acid molecules, at least about 100,000 nucleic acidmolecules, at least about 500,000 nucleic acids, at least about1,000,000 nucleic acid molecules, at least about 5,000,000 nucleic acidmolecules, at least about 10,000,000 nucleic acid molecules, at leastabout 50,000,000 nucleic acid molecules, at least about 100,000,000nucleic acid molecules, at least about 250,000,000 nucleic acidmolecules and in some cases at least about 1 billion nucleic acidmolecules, or more. Nucleic acid molecules of a given bead can includeidentical (or common) barcode sequences, different barcode sequences, ora combination of both. Nucleic acid molecules of a given bead caninclude multiple sets of nucleic acid molecules. Nucleic acid moleculesof a given set can include identical barcode sequences. The identicalbarcode sequences can be different from barcode sequences of nucleicacid molecules of another set.

Moreover, when the population of beads is partitioned, the resultingpopulation of partitions can also include a diverse barcode library thatincludes at least about 1,000 different barcode sequences, at leastabout 5,000 different barcode sequences, at least about 10,000 differentbarcode sequences, at least at least about 50,000 different barcodesequences, at least about 100,000 different barcode sequences, at leastabout 1,000,000 different barcode sequences, at least about 5,000,000different barcode sequences, or at least about 10,000,000 differentbarcode sequences. Additionally, each partition of the population caninclude at least about 1,000 nucleic acid molecules, at least about5,000 nucleic acid molecules, at least about 10,000 nucleic acidmolecules, at least about 50,000 nucleic acid molecules, at least about100,000 nucleic acid molecules, at least about 500,000 nucleic acids, atleast about 1,000,000 nucleic acid molecules, at least about 5,000,000nucleic acid molecules, at least about 10,000,000 nucleic acidmolecules, at least about 50,000,000 nucleic acid molecules, at leastabout 100,000,000 nucleic acid molecules, at least about 250,000,000nucleic acid molecules and in some cases at least about 1 billionnucleic acid molecules.

In some cases, it may be desirable to incorporate multiple differentbarcodes within a given partition, either attached to a single ormultiple beads within the partition. For example, in some cases, amixed, but known set of barcode sequences may provide greater assuranceof identification in the subsequent processing, e.g., by providing astronger address or attribution of the barcodes to a given partition, asa duplicate or independent confirmation of the output from a givenpartition.

The nucleic acid molecules (e.g., oligonucleotides) are releasable fromthe beads upon the application of a particular stimulus to the beads. Insome cases, the stimulus may be a photo-stimulus, e.g., through cleavageof a photo-labile linkage that releases the nucleic acid molecules. Inother cases, a thermal stimulus may be used, where elevation of thetemperature of the beads environment will result in cleavage of alinkage or other release of the nucleic acid molecules form the beads.In still other cases, a chemical stimulus can be used that cleaves alinkage of the nucleic acid molecules to the beads, or otherwise resultsin release of the nucleic acid molecules from the beads. In one case,such compositions include the polyacrylamide matrices described abovefor encapsulation of biological particles, and may be degraded forrelease of the attached nucleic acid molecules through exposure to areducing agent, such as DTT.

In some aspects, provided are systems and methods for controlledpartitioning. Droplet size may be controlled by adjusting certaingeometric features in channel architecture (e.g., microfluidics channelarchitecture). For example, an expansion angle, width, and/or length ofa channel may be adjusted to control droplet size.

FIG. 8 shows images of individual Jurkat cells co-partitioned along withbarcode oligonucleotide containing beads in aqueous droplets in anaqueous in oil emulsion. As illustrated, individual cells may be readilyco-partitioned with individual beads. As will be appreciated,optimization of individual cell loading may be carried out by a numberof methods, including by providing dilutions of cell populations intothe microfluidic system in order to achieve the desired cell loading perpartition as described elsewhere herein.

In operation, once lysed, the nucleic acid contents of the individualcells are then available for further processing within the partitions,including, e.g., fragmentation, amplification and barcoding, as well asattachment of other functional sequences. As noted above, fragmentationmay be accomplished through the co-partitioning of shearing enzymes,such as endonucleases, in order to fragment the nucleic acids intosmaller fragments. These endonucleases may include restrictionendonucleases, including type II and type IIs restriction endonucleasesas well as other nucleic acid cleaving enzymes, such as nickingendonucleases, and the like. In some cases, fragmentation may not bedesired, and full length nucleic acids may be retained within thepartitions, or in the case of encapsulated cells or cell contents,fragmentation may be carried out prior to partitioning, e.g., throughenzymatic methods, e.g., those described herein, or through mechanicalmethods, e.g., mechanical, acoustic or other shearing.

Once co-partitioned, and the cells are lysed to release their nucleicacids, the oligonucleotides disposed upon the bead may be used tobarcode and amplify fragments of those nucleic acids. A particularlyelegant process for use of these barcode oligonucleotides in amplifyingand barcoding fragments of sample nucleic acids is described in detailin U.S. Patent Publication No. US 2014/0378345, filed Jun. 26, 2014, andincorporated by reference herein. Briefly, in one aspect, theoligonucleotides present on the beads that are co-partitioned with thecells, are released from their beads into the partition with the cell'snucleic acids. The oligonucleotides can include, along with the barcodesequence, a primer sequence at its 5′end. This primer sequence may be arandom oligonucleotide sequence intended to randomly prime numerousdifferent regions on the cell's nucleic acids, or it may be a specificprimer sequence targeted to prime upstream of a specific targeted regionof the cell's genome.

Once released, the primer portion of the oligonucleotide can anneal to acomplementary region of the cell's nucleic acid. Extension reactionreagents, e.g., DNA polymerase, nucleoside triphosphates, co-factors(e.g., Mg²⁺ or Mn²⁺), that are also co-partitioned with the cells andbeads, then extend the primer sequence using the cell's nucleic acid asa template, to produce a complementary fragment to the strand of thecell's nucleic acid to which the primer annealed, which complementaryfragment includes the oligonucleotide and its associated barcodesequence. Annealing and extension of multiple primers to differentportions of the cell's nucleic acids will result in a large pool ofoverlapping complementary fragments of the nucleic acid, each possessingits own barcode sequence indicative of the partition in which it wascreated. In some cases, these complementary fragments may themselves beused as a template primed by the oligonucleotides present in thepartition to produce a complement of the complement that again, includesthe barcode sequence. In some cases, this replication process isconfigured such that when the first complement is duplicated, itproduces two complementary sequences at or near its termini, to allowformation of a hairpin structure or partial hairpin structure, thereduces the ability of the molecule to be the basis for producingfurther iterative copies. As described herein, the cell's nucleic acidsmay include any desired nucleic acids within the cell including, forexample, the cell's DNA, e.g., genomic DNA, RNA, e.g., messenger RNA,and the like. For example, in some cases, the methods and systemsdescribed herein are used in characterizing expressed mRNA, including,e.g., the presence and quantification of such mRNA, and may include RNAsequencing processes as the characterization process. Alternatively oradditionally, the reagents partitioned along with the cells may includereagents for the conversion of mRNA into cDNA, e.g., reversetranscriptase enzymes and reagents, to facilitate sequencing processeswhere DNA sequencing is employed. In some cases, where the nucleic acidsto be characterized comprise RNA, e.g., mRNA, schematic illustration ofone example of this is shown in FIG. 3.

As shown, oligonucleotides that include a barcode sequence areco-partitioned in, e.g., a droplet 302 in an emulsion, along with asample nucleic acid 304. As noted elsewhere herein, the oligonucleotides308 may be provided on a bead 306 that is co-partitioned with the samplenucleic acid 304, which oligonucleotides are releasable from the bead306, as shown in panel A. The oligonucleotides 308 include a barcodesequence 312, in addition to one or more functional sequences, e.g.,sequences 310, 314 and 316. For example, oligonucleotide 308 is shown ascomprising barcode sequence 312, as well as sequence 310 that mayfunction as an attachment or immobilization sequence for a givensequencing system, e.g., a P5 sequence used for attachment in flow cellsof an Illumina Hiseq® or Miseq® system. As shown, the oligonucleotidesalso include a primer sequence 316, which may include a random ortargeted N-mer for priming replication of portions of the sample nucleicacid 304. Also included within oligonucleotide 308 is a sequence 314which may provide a sequencing priming region, such as a “read1” or R1priming region, that is used to prime polymerase mediated, templatedirected sequencing by synthesis reactions in sequencing systems. Aswill be appreciated, the functional sequences may be selected to becompatible with a variety of different sequencing systems, e.g., 454Sequencing, Ion Torrent Proton or PGM, Illumina X10, etc., and therequirements thereof. In many cases, the barcode sequence 312,immobilization sequence 310 and R1 sequence 314 may be common to all ofthe oligonucleotides attached to a given bead. The primer sequence 316may vary for random N-mer primers, or may be common to theoligonucleotides on a given bead for certain targeted applications.

As will be appreciated, in some cases, the functional sequences mayinclude primer sequences useful for RNA-seq applications. For example,in some cases, the oligonucleotides may include poly-dT primers forpriming reverse transcription of RNA for RNA-seq. In still other cases,oligonucleotides in a given partition, e.g., included on an individualbead, may include multiple types of primer sequences in addition to thecommon barcode sequences, such as both DNA-sequencing and RNA sequencingprimers, e.g., poly-dT primer sequences included within theoligonucleotides coupled to the bead. In such cases, a singlepartitioned cell may be both subjected to DNA and RNA sequencingprocesses.

Based upon the presence of primer sequence 316, the oligonucleotides canprime the sample nucleic acid as shown in panel B, which allows forextension of the oligonucleotides 308 and 308 a using polymerase enzymesand other extension reagents also co-partitioned with the bead 306 andsample nucleic acid 304. As shown in panel C, following extension of theoligonucleotides that, for random N-mer primers, would anneal tomultiple different regions of the sample nucleic acid 304; multipleoverlapping complements or fragments of the nucleic acid are created,e.g., fragments 318 and 320. Although including sequence portions thatare complementary to portions of sample nucleic acid, e.g., sequences322 and 324, these constructs are generally referred to herein ascomprising fragments of the sample nucleic acid 304, having the attachedbarcode sequences.

The barcoded nucleic acid fragments may then be subjected tocharacterization, e.g., through sequence analysis, or they may befurther amplified in the process, as shown in panel D. For example,additional oligonucleotides, e.g., oligonucleotide 308 b, also releasedfrom bead 306, may prime the fragments 318 and 320. This shown in forfragment 318. In particular, again, based upon the presence of therandom N-mer primer 316 b in oligonucleotide 308 b (which in many casescan be different from other random N-mers in a given partition, e.g.,primer sequence 316), the oligonucleotide anneals with the fragment 318,and is extended to create a complement 326 to at least a portion offragment 318 which includes sequence 328, that comprises a duplicate ofa portion of the sample nucleic acid sequence. Extension of theoligonucleotide 308 b continues until it has replicated through theoligonucleotide portion 308 of fragment 318. As noted elsewhere herein,and as illustrated in panel D, the oligonucleotides may be configured toprompt a stop in the replication by the polymerase at a desired point,e.g., after replicating through sequences 316 and 314 of oligonucleotide308 that is included within fragment 318. As described herein, this maybe accomplished by different methods, including, for example, theincorporation of different nucleotides and/or nucleotide analogues thatare not capable of being processed by the polymerase enzyme used. Forexample, this may include the inclusion of uracil containing nucleotideswithin the sequence region 312 to prevent a non-uracil tolerantpolymerase to cease replication of that region. As a result a fragment326 is created that includes the full-length oligonucleotide 308 b atone end, including the barcode sequence 312, the attachment sequence310, the R1 primer region 314, and the random N-mer sequence 316 b. Atthe other end of the sequence may be included the complement 316′ to therandom N-mer of the first oligonucleotide 308, as well as a complementto all or a portion of the R1 sequence, shown as sequence 314′. The R1sequence 314 and its complement 314′ are then able to hybridize togetherto form a partial hairpin structure 328. As will be appreciated becausethe random N-mers differ among different oligonucleotides, thesesequences and their complements would not be expected to participate inhairpin formation, e.g., sequence 316′, which is the complement torandom N-mer 316, would not be expected to be complementary to randomN-mer sequence 316 b. This would not be the case for other applications,e.g., targeted primers, where the N-mers would be common amongoligonucleotides within a given partition.

By forming these partial hairpin structures, it allows for the removalof first level duplicates of the sample sequence from furtherreplication, e.g., preventing iterative copying of copies. The partialhairpin structure also provides a useful structure for subsequentprocessing of the created fragments, e.g., fragment 326.

In general, the amplification of the cell's nucleic acids is carried outuntil the barcoded overlapping fragments within the partition constituteat least 1× coverage of the particular portion or all of the cell'sgenome, at least 2×, at least 3×, at least 4×, at least 5×, at least10×, at least 20×, at least 40× or more coverage of the genome or itsrelevant portion of interest. Once the barcoded fragments are produced,they may be directly sequenced on an appropriate sequencing system,e.g., an Illumina Hiseq®, Miseq® or X10 system, or they may be subjectedto additional processing, such as further amplification, attachment ofother functional sequences, e.g., second sequencing primers, for reversereads, sample index sequences, and the like.

All of the fragments from multiple different partitions may then bepooled for sequencing on high throughput sequencers as described herein,where the pooled fragments comprise a large number of fragments derivedfrom the nucleic acids of different cells or small cell populations, butwhere the fragments from the nucleic acids of a given cell will sharethe same barcode sequence. In particular, because each fragment is codedas to its partition of origin, and consequently its single cell or smallpopulation of cells, the sequence of that fragment may be attributedback to that cell or those cells based upon the presence of the barcode,which will also aid in applying the various sequence fragments frommultiple partitions to assembly of individual genomes for differentcells. This is schematically illustrated in FIG. 4. As shown in oneexample, a first nucleic acid 404 from a first cell 400, and a secondnucleic acid 406 from a second cell 402 are each partitioned along withtheir own sets of barcode oligonucleotides as described above. Thenucleic acids may comprise a chromosome, entire genome or other largenucleic acid from the cells.

Within each partition, each cell's nucleic acids 404 and 406 is thenprocessed to separately provide overlapping set of second fragments ofthe first fragment(s), e.g., second fragment sets 408 and 410. Thisprocessing also provides the second fragments with a barcode sequencethat is the same for each of the second fragments derived from aparticular first fragment. As shown, the barcode sequence for secondfragment set 408 is denoted by “1” while the barcode sequence forfragment set 410 is denoted by “2”. A diverse library of barcodes may beused to differentially barcode large numbers of different fragment sets.However, it is not necessary for every second fragment set from adifferent first fragment to be barcoded with different barcodesequences. In fact, in many cases, multiple different first fragmentsmay be processed concurrently to include the same barcode sequence.Diverse barcode libraries are described in detail elsewhere herein.

The barcoded fragments, e.g., from fragment sets 408 and 410, may thenbe pooled for sequencing using, for example, sequence by synthesistechnologies available from Illumina or Ion Torrent division ofThermo-Fisher, Inc. Once sequenced, the sequence reads 412 can beattributed to their respective fragment set, e.g., as shown inaggregated reads 414 and 416, at least in part based upon the includedbarcodes, and in some cases, in part based upon the sequence of thefragment itself. The attributed sequence reads for each fragment set arethen assembled to provide the assembled sequence for each cell's nucleicacids, e.g., sequences 418 and 420, which in turn, may be attributed toindividual cells, e.g., cells 400 and 402.

While described in terms of analyzing the genetic material presentwithin cells, the methods and systems described herein may have muchbroader applicability, including the ability to characterize otheraspects of individual cells or cell populations, by allowing for theallocation of reagents to individual cells, and providing for theattributable analysis or characterization of those cells in response tothose reagents. These methods and systems are particularly valuable inbeing able to characterize cells for, e.g., research, diagnostic,pathogen identification, and many other purposes. By way of example, awide range of different cell surface features, e.g., cell surfaceproteins like cluster of differentiation or CD proteins, havesignificant diagnostic relevance in characterization of diseases likecancer.

In one particularly useful application, the methods and systemsdescribed herein may be used to characterize cell features, such as cellsurface features, e.g., proteins, receptors, etc. In particular, themethods described herein may be used to attach reporter molecules tothese cell features, that when partitioned as described above, may bebarcoded and analyzed, e.g., using DNA sequencing technologies, toascertain the presence, and in some cases, relative abundance orquantity of such cell features within an individual cell or populationof cells.

In a particular example, a library of potential cell binding ligands,e.g., antibodies, antibody fragments, cell surface receptor bindingmolecules, or the like, maybe provided associated with a first set ofnucleic acid reporter molecules, e.g., where a different reporteroligonucleotide sequence is associated with a specific ligand, andtherefore capable of binding to a specific cell surface feature. In someaspects, different members of the library may be characterized by thepresence of a different oligonucleotide sequence label, e.g., anantibody to a first type of cell surface protein or receptor would haveassociated with it a first known reporter oligonucleotide sequence,while an antibody to a second receptor protein would have a differentknown reporter oligonucleotide sequence associated with it. Prior toco-partitioning, the cells would be incubated with the library ofligands, that may represent antibodies to a broad panel of differentcell surface features, e.g., receptors, proteins, etc., and whichinclude their associated reporter oligonucleotides. Unbound ligands arewashed from the cells, and the cells are then co-partitioned along withthe barcode oligonucleotides described above. As a result, thepartitions will include the cell or cells, as well as the bound ligandsand their known, associated reporter oligonucleotides.

Without the need for lysing the cells within the partitions, one couldthen subject the reporter oligonucleotides to the barcoding operationsdescribed above for cellular nucleic acids, to produce barcoded,reporter oligonucleotides, where the presence of the reporteroligonucleotides can be indicative of the presence of the particularcell surface feature, and the barcode sequence will allow theattribution of the range of different cell surface features to a givenindividual cell or population of cells based upon the barcode sequencethat was co-partitioned with that cell or population of cells. As aresult, one may generate a cell-by-cell profile of the cell surfacefeatures within a broader population of cells. This aspect of themethods and systems described herein, is described in greater detailbelow.

This example is schematically illustrated in FIG. 5. As shown, apopulation of cells, represented by cells 502 and 504 are incubated witha library of cell surface associated reagents, e.g., antibodies, cellsurface binding proteins, ligands or the like, where each different typeof binding group includes an associated nucleic acid reporter moleculeassociated with it, shown as ligands and associated reporter molecules506, 508, 510 and 512 (with the reporter molecules being indicated bythe differently shaded circles). Where the cell expresses the surfacefeatures that are bound by the library, the ligands and their associatedreporter molecules can become associated or coupled with the cellsurface. Individual cells are then partitioned into separate partitions,e.g., droplets 514 and 516, along with their associated ligand/reportermolecules, as well as an individual barcode oligonucleotide bead asdescribed elsewhere herein, e.g., beads 522 and 524, respectively. Aswith other examples described herein, the barcoded oligonucleotides arereleased from the beads and used to attach the barcode sequence thereporter molecules present within each partition with a barcode that iscommon to a given partition, but which varies widely among differentpartitions. For example, as shown in FIG. 5, the reporter molecules thatassociate with cell 502 in partition 514 are barcoded with barcodesequence 518, while the reporter molecules associated with cell 504 inpartition 516 are barcoded with barcode 520. As a result, one isprovided with a library of oligonucleotides that reflects the surfaceligands of the cell, as reflected by the reporter molecule, but which issubstantially attributable to an individual cell by virtue of a commonbarcode sequence, allowing a single cell level profiling of the surfacecharacteristics of the cell. As will be appreciated, this process is notlimited to cell surface receptors but may be used to identify thepresence of a wide variety of specific cell structures, chemistries orother characteristics.

III. Applications of Single Cell Analysis

There are a wide variety of different applications of the single cellprocessing and analysis methods and systems described herein, includinganalysis of specific individual ells, analysis of different cell typeswithin populations of differing cell types, analysis andcharacterization of large populations of cells for environmental, humanhealth, epidemiological forensic, or any of a wide variety of differentapplications.

A particularly valuable application of the single cell analysisprocesses described herein is in the sequencing and characterization ofcancer cells. In particular, conventional analytical techniques,including the ensemble sequencing processes alluded to above, are nothighly adept at picking small variations in genomic make-up of cancercells, particularly where those exist in a sea of normal tissue cells.Further, even as between tumor cells, wide variations can exist and canbe masked by the ensemble approaches to sequencing (See, e.g., Patel, etal., Single-cell RNA-seq highlights intratumoral heterogeneity inprimary glioblastoma, Science DOI: 10.1126/science.1254257 (Publishedonline Jun. 12, 2014). Cancer cells may be derived from solid tumors,hematological malignancies, cell lines, or obtained as circulating tumorcells, and subjected to the partitioning processes described above. Uponanalysis, one can identify individual cell sequences as deriving from asingle cell or small group of cells, and distinguish those over normaltissue cell sequences. Further, as described in co-pending U.S. patentapplication Ser. No. 14/752,589 (US20150376700), filed Jun. 26, 2015,the full disclosures of which is hereby incorporated herein by referencein its entirety for all purposes, one may also obtain phased sequenceinformation from each cell, allowing clearer characterization of thehaplotype variants within a cancer cell. The single cell analysisapproach is particularly useful for systems and methods involving lowquantities of input nucleic acids, as described in co-pending U.S.patent application Ser. No. 14/752,602 (US20150376605), filed Jun. 26,2015, the full disclosures of which is hereby incorporated herein byreference in its entirety for all purposes.

As with cancer cell analysis, the analysis and diagnosis of fetal healthor abnormality through the analysis of fetal cells is a difficult taskusing conventional techniques. In particular, in the absence ofrelatively invasive procedures, such as amniocentesis obtaining fetalcell samples can employ harvesting those cells from the maternalcirculation. As will be appreciated, such circulating fetal cells makeup an extremely small fraction of the overall cellular population ofthat circulation. As a result complex analyses are performed in order tocharacterize what of the obtained data is likely derived from fetalcells as opposed to maternal cells. By employing the single cellcharacterization methods and systems described herein, however, one canattribute genetic make up to individual cells, and categorize thosecells as maternal or fetal based upon their respective genetic make-up.Further, the genetic sequence of fetal cells may be used to identify anyof a number of genetic disorders, including, e.g., aneuploidy such asDown syndrome, Edwards syndrome, and Patau syndrome.

The ability to characterize individual cells from larger diversepopulations of cells is also of significant value in both environmentaltesting as well as in forensic analysis, where samples may, by theirnature, be made up of diverse populations of cells and other materialthat “contaminate” the sample, relative to the cells for which thesample is being tested, e.g., environmental indicator organisms, toxicorganisms, and the like for, e.g., environmental and food safetytesting, victim and/or perpetrator cells in forensic analysis for sexualassault, and other violent crimes, and the like.

Additional useful applications of the above described single cellsequencing and characterization processes are in the field ofneuroscience research and diagnosis. In particular, neural cells caninclude long interspersed nuclear elements (LINEs), or ‘jumping’ genesthat can move around the genome, which cause each neuron to differ fromits neighbor cells. Research has shown that the number of LINEs in humanbrain exceeds that of other tissues, e.g., heart and liver tissue, withbetween 80 and 300 unique insertions (See, e.g., Coufal, N. G. et al.Nature 460, 1127-1131 (2009)). These differences have been postulated asbeing related to a person's susceptibility to neuro-logical disorders(see, e.g., Muotri, A. R. et al. Nature 468, 443-446 (2010)), or providethe brain with a diversity with which to respond to challenges. As such,the methods described herein may be used in the sequencing andcharacterization of individual neural cells.

The single cell analysis methods described herein are also useful in theanalysis of gene expression, as noted above, both in terms ofidentification of RNA transcripts and their quantitation. In particular,using the single cell level analysis methods described herein, one canisolate and analyze the RNA transcripts present in individual cells,populations of cells, or subsets of populations of cells. In particular,in some cases, the barcode oligonucleotides may be configured to prime,replicate and consequently yield barcoded fragments of RNA fromindividual cells. For example, in some cases, the barcodeoligonucleotides may include mRNA specific priming sequences, e.g.,poly-dT primer segments that allow priming and replication of mRNA in areverse transcription reaction or other targeted priming sequences(e.g., gene-specific primers). Alternatively or additionally, random RNApriming may be carried out using random N-mer primer segments of thebarcode oligonucleotides.

FIG. 6 provides a schematic of one example method for RNA expressionanalysis in individual cells using the methods described herein. Asshown, at operation 602 a cell containing sample is sorted for viablecells, which are quantified and diluted for subsequent partitioning. Atoperation 604, the individual cells separately co-partitioned with gelbeads bearing the barcoding oligonucleotides as described herein. Thecells are lysed and the barcoded oligonucleotides released into thepartitions at operation 606, where they interact with and hybridize tothe mRNA at operation 608, e.g., by virtue of a poly-dT primer sequence,which is complementary to the poly-A tail of the mRNA. Using the poly-dTbarcode oligonucleotide as a priming sequence, a reverse transcriptionreaction is carried out using the engineered reverse transcriptaseenzymes described herein at operation 610 to synthesize a cDNAtranscript of the mRNA that includes the barcode sequence. The barcodedcDNA transcripts are then subjected to additional amplification atoperation 612, e.g., using a PCR process, purification at operation 614,before they are placed on a nucleic acid sequencing system fordetermination of the cDNA sequence and its associated barcodesequence(s). In some cases, as shown, operations 602 through 608 canoccur while the reagents remain in their original droplet or partition,while operations 612 through 616 can occur in bulk (e.g., outside of thepartition). In the case where a partition is a droplet in an emulsion,the emulsion can be broken and the contents of the droplet pooled inorder to complete operations 612 through 616. In some cases, barcodeoligonucleotides may be digested with exonucleases after the emulsion isbroken. Exonuclease activity can be inhibited byethylenediaminetetraacetic acid (EDTA) following primer digestion. Insome cases, operation 610 may be performed either within the partitionsbased upon co-partitioning of the reverse transcription mixture, e.g.,reverse transcriptase and associated reagents, or it may be performed inbulk.

As noted elsewhere herein, the structure of the barcode oligonucleotidesmay include a number of sequence elements in addition to theoligonucleotide barcode sequence. One example of a barcodeoligonucleotide for use in RNA analysis as described above is shown inFIG. 7. As shown, the overall oligonucleotide 702 is coupled to a bead704 by a releasable linkage 706, such as a disulfide linker. Theoligonucleotide may include functional sequences that are used insubsequent processing, such as functional sequence 708, which mayinclude one or more of a sequencer specific flow cell attachmentsequence, e.g., a P5 sequence for Illumina sequencing systems, as wellas sequencing primer sequences, e.g., a R1 primer for Illuminasequencing systems. A barcode sequence 710 is included within thestructure for use in barcoding the sample RNA. An mRNA specific primingsequence, such as poly-dT sequence 712 is also included in theoligonucleotide structure. An anchoring sequence segment 714 may beincluded to ensure that the poly-dT sequence hybridizes at the sequenceend of the mRNA. This anchoring sequence can include a random shortsequence of nucleotides, e.g., 1-mer, 2-mer, 3-mer or longer sequence,which will ensure that the poly-dT segment is more likely to hybridizeat the sequence end of the poly-A tail of the mRNA. An additionalsequence segment 716 may be provided within the oligonucleotidesequence. In some cases, this additional sequence provides a uniquemolecular sequence segment, e.g., as a random sequence (e.g., such as arandom N-mer sequence) that varies across individual oligonucleotidescoupled to a single bead, whereas barcode sequence 710 can be constantamong oligonucleotides tethered to an individual bead. This uniquesequence serves to provide a unique identifier of the starting mRNAmolecule that was captured, in order to allow quantitation of the numberof original expressed RNA. As will be appreciated, although shown as asingle oligonucleotide tethered to the surface of a bead, individualbead can include tens to hundreds of thousands or even millions ofindividual oligonucleotide molecules, where, as noted, the barcodesegment can be constant or relatively constant for a given bead, butwhere the variable or unique sequence segment will vary across anindividual bead. This unique molecular sequence segment may include from5 to about 8 or more nucleotides within the sequence of theoligonucleotides. In some cases, the unique molecular sequence segmentcan be 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or20 nucleotides in length or longer. In some cases, the unique molecularsequence segment can be at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, 18, 19 or 20 nucleotides in length or longer. In somecases, the unique molecular sequence segment can be at most 2, 3, 4, 5,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 nucleotides inlength or shorter.

In operation, and with reference to FIG. 6 and FIG. 7, a cell isco-partitioned along with a barcode bearing bead and lysed while thebarcoded oligonucleotides are released from the bead. The poly-dTportion of the released barcode oligonucleotide then hybridizes to thepoly-A tail of the mRNA. The poly-dT segment then primes the reversetranscription of the mRNA to produce a cDNA transcript of the mRNA, butwhich includes each of the sequence segments 708-716 of the barcodeoligonucleotide. Again, because the oligonucleotide 702 includes ananchoring sequence 714, it will more likely hybridize to and primereverse transcription at the sequence end of the poly-A tail of themRNA. Within any given partition, all of the cDNA transcripts of theindividual mRNA molecules will include a common barcode sequence segment710. However, by including the unique random N-mer sequence, thetranscripts made from different mRNA molecules within a given partitionwill vary at this unique sequence. This provides a quantitation featurethat can be identifiable even following any subsequent amplification ofthe contents of a given partition, e.g., the number of unique segmentsassociated with a common barcode can be indicative of the quantity ofmRNA originating from a single partition, and thus, a single cell. Asnoted above, the transcripts are then amplified, cleaned up andsequenced to identify the sequence of the cDNA transcript of the mRNA,as well as to sequence the barcode segment and the unique sequencesegment.

As noted elsewhere herein, while a poly-dT primer sequence is described,other targeted or random priming sequences may also be used in primingthe reverse transcription reaction. Likewise, although described asreleasing the barcoded oligonucleotides into the partition along withthe contents of the lysed cells, it will be appreciated that in somecases, the gel bead bound oligonucleotides may be used to hybridize adcapture the mRNA on the solid phase of the gel beads, in order tofacilitate the separation of the RNA from other cell contents.

An additional example of a barcode oligonucleotide for use in RNAanalysis, including messenger RNA (mRNA, including mRNA obtained from acell) analysis, is shown in FIG. 9A. As shown, the overalloligonucleotide 902 can be coupled to a bead 904 by a releasable linkage906, such as a disulfide linker. The oligonucleotide may includefunctional sequences that are used in subsequent processing, such asfunctional sequence 908, which may include a sequencer specific flowcell attachment sequence, e.g., a P5 sequence for Illumina sequencingsystems, as well as functional sequence 910, which may includesequencing primer sequences, e.g., a R1 primer binding site for Illuminasequencing systems. A barcode sequence 912 is included within thestructure for use in barcoding the sample RNA. An RNA specific (e.g.,mRNA specific) priming sequence, such as poly-dT sequence 914 is alsoincluded in the oligonucleotide structure. An anchoring sequence segment(not shown) may be included to ensure that the poly-dT sequencehybridizes at the sequence end of the mRNA. An additional sequencesegment 916 may be provided within the oligonucleotide sequence. Thisadditional sequence can provide a unique molecular sequence segment,e.g., as a random N-mer sequence that varies across individualoligonucleotides coupled to a single bead, whereas barcode sequence 912can be constant among oligonucleotides tethered to an individual bead.As described elsewhere herein, this unique sequence can serve to providea unique identifier of the starting mRNA molecule that was captured, inorder to allow quantitation of the number of original expressed RNA,e.g., mRNA counting. As will be appreciated, although shown as a singleoligonucleotide tethered to the surface of a bead, individual beads caninclude tens to hundreds of thousands or even millions of individualoligonucleotide molecules, where, as noted, the barcode segment can beconstant or relatively constant for a given bead, but where the variableor unique sequence segment will vary across an individual bead.

In an example method of cellular RNA (e.g., mRNA) analysis and inreference to FIG. 9A, a cell is co-partitioned along with a barcodebearing bead, switch oligo 924, and other reagents such as reversetranscriptase, a reducing agent and dNTPs into a partition (e.g., adroplet in an emulsion). In operation 950, the cell is lysed while thebarcoded oligonucleotides 902 are released from the bead (e.g., via theaction of the reducing agent) and the poly-dT segment 914 of thereleased barcode oligonucleotide then hybridizes to the poly-A tail ofmRNA 920 that is released from the cell. Next, in operation 952 thepoly-dT segment 914 is extended in a reverse transcription reactionusing the mRNA as a template to produce a cDNA transcript 922complementary to the mRNA and also includes each of the sequencesegments 908, 912, 910, 916 and 914 of the barcode oligonucleotide.Terminal transferase activity of the reverse transcriptase can addadditional bases to the cDNA transcript (e.g., polyC). The switch oligo924 may then hybridize with the additional bases added to the cDNAtranscript and facilitate template switching. A sequence complementaryto the switch oligo sequence can then be incorporated into the cDNAtranscript 922 via extension of the cDNA transcript 922 using the switcholigo 924 as a template. Within any given partition, all of the cDNAtranscripts of the individual mRNA molecules will include a commonbarcode sequence segment 912. However, by including the unique randomN-mer sequence 916, the transcripts made from different mRNA moleculeswithin a given partition will vary at this unique sequence. As describedelsewhere herein, this provides a quantitation feature that can beidentifiable even following any subsequent amplification of the contentsof a given partition, e.g., the number of unique segments associatedwith a common barcode can be indicative of the quantity of mRNAoriginating from a single partition, and thus, a single cell. Followingoperation 952, the cDNA transcript 922 is then amplified with primers926 (e.g., PCR primers) in operation 954. Next, the amplified product isthen purified (e.g., via solid phase reversible immobilization (SPRI))in operation 956. At operation 958, the amplified product is thensheared, ligated to additional functional sequences, and furtheramplified (e.g., via PCR). The functional sequences may include asequencer specific flow cell attachment sequence 930, e.g., a P7sequence for Illumina sequencing systems, as well as functional sequence928, which may include a sequencing primer binding site, e.g., for a R2primer for Illumina sequencing systems, as well as functional sequence932, which may include a sample index, e.g., an i7 sample index sequencefor Illumina sequencing systems. In some cases, operations 950 and 952can occur in the partition, while operations 954, 956 and 958 can occurin bulk solution (e.g., in a pooled mixture outside of the partition).In the case where a partition is a droplet in an emulsion, the emulsioncan be broken and the contents of the droplet pooled in order tocomplete operations 954, 956 and 958. In some cases, operation 954 maybe completed in the partition. In some cases, barcode oligonucleotidesmay be digested with exonucleases after the emulsion is broken.Exonuclease activity can be inhibited by ethylenediaminetetraacetic acid(EDTA) following primer digestion. Although described in terms ofspecific sequence references used for certain sequencing systems, e.g.,Illumina systems, it will be understood that the reference to thesesequences is for illustration purposes only, and the methods describedherein may be configured for use with other sequencing systemsincorporating specific priming, attachment, index, and other operationalsequences used in those systems, e.g., systems available from IonTorrent, Oxford Nanopore, Genia, Pacific Biosciences, Complete Genomics,and the like.

In an alternative example of a barcode oligonucleotide for use in RNA(e.g., cellular RNA) analysis as shown in FIG. 9A, functional sequence908 may be a P7 sequence and functional sequence 910 may be a R2 primerbinding site. Moreover, the functional sequence 930 may be a P5sequence, functional sequence 928 may be a R1 primer binding site, andfunctional sequence 932 may be an i5 sample index sequence for Illuminasequencing systems. The configuration of the constructs generated bysuch a barcode oligonucleotide can help minimize (or avoid) sequencingof the poly-dT sequence during sequencing.

Shown in FIG. 9B is another example method for RNA analysis, includingcellular mRNA analysis. In this method, the switch oligo 924 isco-partitioned with the individual cell and barcoded bead along withreagents such as reverse transcriptase, a reducing agent and dNTPs intoa partition (e.g., a droplet in an emulsion). The switch oligo 924 maybe labeled with an additional tag 934, e.g. biotin. In operation 951,the cell is lysed while the barcoded oligonucleotides 902 (e.g., asshown in FIG. 9A) are released from the bead (e.g., via the action ofthe reducing agent). In some cases, sequence 908 is a P7 sequence andsequence 910 is a R2 primer binding site. In other cases, sequence 908is a P5 sequence and sequence 910 is a R1 primer binding site. Next, thepoly-dT segment 914 of the released barcode oligonucleotide hybridizesto the poly-A tail of mRNA 920 that is released from the cell. Inoperation 953, the poly-dT segment 914 is then extended in a reversetranscription reaction using the mRNA as a template to produce a cDNAtranscript 922 complementary to the mRNA and also includes each of thesequence segments 908, 912, 910, 916 and 914 of the barcodeoligonucleotide. Terminal transferase activity of the reversetranscriptase can add additional bases to the cDNA transcript (e.g.,polyC). The switch oligo 924 may then hybridize with the cDNA transcriptand facilitate template switching. A sequence complementary to theswitch oligo sequence can then be incorporated into the cDNA transcript922 via extension of the cDNA transcript 922 using the switch oligo 924as a template. Next, an isolation operation 960 can be used to isolatethe cDNA transcript 922 from the reagents and oligonucleotides in thepartition. The additional tag 934, e.g. biotin, can be contacted with aninteracting tag 936, e.g., streptavidin, which may be attached to amagnetic bead 938. At operation 960 the cDNA can be isolated with apull-down operation (e.g., via magnetic separation, centrifugation)before amplification (e.g., via PCR) in operation 955, followed bypurification (e.g., via solid phase reversible immobilization (SPRI)) inoperation 957 and further processing (shearing, ligation of sequences928, 932 and 930 and subsequent amplification (e.g., via PCR)) inoperation 959. In some cases where sequence 908 is a P7 sequence andsequence 910 is a R2 primer binding site, sequence 930 is a P5 sequenceand sequence 928 is a R1 primer binding site and sequence 932 is an i5sample index sequence. In some cases where sequence 908 is a P5 sequenceand sequence 910 is a R1 primer binding site, sequence 930 is a P7sequence and sequence 928 is a R2 primer binding site and sequence 932is an i7 sample index sequence. In some cases, as shown, operations 951and 953 can occur in the partition, while operations 960, 955, 957 and959 can occur in bulk solution (e.g., in a pooled mixture outside of thepartition). In the case where a partition is a droplet in an emulsion,the emulsion can be broken and the contents of the droplet pooled inorder to complete operation 960. The operations 955, 957, and 959 canthen be carried out following operation 960 after the transcripts arepooled for processing.

Shown in FIG. 9C is another example method for RNA analysis, includingcellular mRNA analysis. In this method, the switch oligo 924 isco-partitioned with the individual cell and barcoded bead along withreagents such as reverse transcriptase, a reducing agent and dNTPs in apartition (e.g., a droplet in an emulsion). In operation 961, the cellis lysed while the barcoded oligonucleotides 902 (e.g., as shown in FIG.9A) are released from the bead (e.g., via the action of the reducingagent). In some cases, sequence 908 is a P7 sequence and sequence 910 isa R2 primer binding site. In other cases, sequence 908 is a P5 sequenceand sequence 910 is a R1 primer binding site. Next, the poly-dT segment914 of the released barcode oligonucleotide then hybridizes to thepoly-A tail of mRNA 920 that is released from the cell. Next, inoperation 963 the poly-dT segment 914 is then extended in a reversetranscription reaction using the mRNA as a template to produce a cDNAtranscript 922 complementary to the mRNA and also includes each of thesequence segments 908, 912, 910, 916 and 914 of the barcodeoligonucleotide. Terminal transferase activity of the reversetranscriptase can add additional bases to the cDNA transcript (e.g.,polyC). The switch oligo 924 may then hybridize with the cDNA transcriptand facilitate template switching. A sequence complementary to theswitch oligo sequence can then be incorporated into the cDNA transcript922 via extension of the cDNA transcript 922 using the switch oligo 924as a template. Following operation 961 and operation 963, mRNA 920 andcDNA transcript 922 are denatured in operation 962. At operation 964, asecond strand is extended from a primer 940 having an additional tag942, e.g. biotin, and hybridized to the cDNA transcript 922. Also inoperation 964, the biotin labeled second strand can be contacted with aninteracting tag 936, e.g. streptavidin, which may be attached to amagnetic bead 938. The cDNA can be isolated with a pull-down operation(e.g., via magnetic separation, centrifugation) before amplification(e.g., via polymerase chain reaction (PCR)) in operation 965, followedby purification (e.g., via solid phase reversible immobilization (SPRI))in operation 967 and further processing (shearing, ligation of sequences928, 932 and 930 and subsequent amplification (e.g., via PCR)) inoperation 969. In some cases where sequence 908 is a P7 sequence andsequence 910 is a R2 primer binding site, sequence 930 is a P5 sequenceand sequence 928 is a R1 primer binding site and sequence 932 is an i5sample index sequence. In some cases where sequence 908 is a P5 sequenceand sequence 910 is a R1 primer binding site, sequence 930 is a P7sequence and sequence 928 is a R2 primer binding site and sequence 932is an i7 sample index sequence. In some cases, operations 961 and 963can occur in the partition, while operations 962, 964, 965, 967, and 969can occur in bulk (e.g., outside the partition). In the case where apartition is a droplet in an emulsion, the emulsion can be broken andthe contents of the droplet pooled in order to complete operations 962,964, 965, 967 and 969.

Shown in FIG. 9D is another example method for RNA analysis, includingcellular mRNA analysis. In this method, the switch oligo 924 isco-partitioned with the individual cell and barcoded bead along withreagents such as reverse transcriptase, a reducing agent and dNTPs. Inoperation 971, the cell is lysed while the barcoded oligonucleotides 902(e.g., as shown in FIG. 9A) are released from the bead (e.g., via theaction of the reducing agent). In some cases, sequence 908 is a P7sequence and sequence 910 is a R2 primer binding site. In other cases,sequence 908 is a P5 sequence and sequence 910 is a R1 primer bindingsite. Next the poly-dT segment 914 of the released barcodeoligonucleotide then hybridizes to the poly-A tail of mRNA 920 that isreleased from the cell. Next in operation 973, the poly-dT segment 914is then extended in a reverse transcription reaction using the mRNA as atemplate to produce a cDNA transcript 922 complementary to the mRNA andalso includes each of the sequence segments 908, 912, 910, 916 and 914of the barcode oligonucleotide. Terminal transferase activity of thereverse transcriptase can add additional bases to the cDNA transcript(e.g., polyC). The switch oligo 924 may then hybridize with the cDNAtranscript and facilitate template switching. A sequence complementaryto the switch oligo sequence can then be incorporated into the cDNAtranscript 922 via extension of the cDNA transcript 922 using the switcholigo 924 as a template. In operation 966, the mRNA 920, cDNA transcript922 and switch oligo 924 can be denatured, and the cDNA transcript 922can be hybridized with a capture oligonucleotide 944 labeled with anadditional tag 946, e.g. biotin. In this operation, the biotin-labeledcapture oligonucleotide 944, which is hybridized to the cDNA transcript,can be contacted with an interacting tag 936, e.g. streptavidin, whichmay be attached to a magnetic bead 938. Following separation from otherspecies (e.g., excess barcoded oligonucleotides) using a pull-downoperation (e.g., via magnetic separation, centrifugation), the cDNAtranscript can be amplified (e.g., via PCR) with primers 926 atoperation 975, followed by purification (e.g., via solid phasereversible immobilization (SPRI)) in operation 977 and furtherprocessing (shearing, ligation of sequences 928, 932 and 930 andsubsequent amplification (e.g., via PCR)) in operation 979. In somecases where sequence 908 is a P7 sequence and sequence 910 is a R2primer binding site, sequence 930 is a P5 sequence and sequence 928 is aR1 primer binding site and sequence 932 is an i5 sample index sequence.In other cases where sequence 908 is a P5 sequence and sequence 910 is aR1 primer binding site, sequence 930 is a P7 sequence and sequence 928is a R2 primer binding site and sequence 932 is an i7 sample indexsequence. In some cases, operations 971 and 973 can occur in thepartition, while operations 966, 975, 977 (purification), and 979 canoccur in bulk (e.g., outside the partition). In the case where apartition is a droplet in an emulsion, the emulsion can be broken andthe contents of the droplet pooled in order to complete operations 966,975, 977 and 979.

Shown in FIG. 9E is another example method for RNA analysis, includingcellular RNA analysis. In this method, an individual cell isco-partitioned along with a barcode bearing bead, a switch oligo 990,and other reagents such as reverse transcriptase, a reducing agent anddNTPs into a partition (e.g., a droplet in an emulsion). In operation981, the cell is lysed while the barcoded oligonucleotides (e.g., 902 asshown in FIG. 9A) are released from the bead (e.g., via the action ofthe reducing agent). In some cases, sequence 908 is a P7 sequence andsequence 910 is a R2 primer binding site. In other cases, sequence 908is a P5 sequence and sequence 910 is a R1 primer binding site. Next, thepoly-dT segment of the released barcode oligonucleotide then hybridizesto the poly-A tail of mRNA 920 released from the cell. Next at operation983, the poly-dT segment is then extended in a reverse transcriptionreaction to produce a cDNA transcript 922 complementary to the mRNA andalso includes each of the sequence segments 908, 912, 910, 916 and 914of the barcode oligonucleotide. Terminal transferase activity of thereverse transcriptase can add additional bases to the cDNA transcript(e.g., polyC). The switch oligo 990 may then hybridize with the cDNAtranscript and facilitate template switching. A sequence complementaryto the switch oligo sequence and including a T7 promoter sequence, canbe incorporated into the cDNA transcript 922. At operation 968, a secondstrand is synthesized and at operation 970 the T7 promoter sequence canbe used by T7 polymerase to produce RNA transcripts in in vitrotranscription. At operation 985 the RNA transcripts can be purified(e.g., via solid phase reversible immobilization (SPRI)), reversetranscribed to form DNA transcripts, and a second strand can besynthesized for each of the DNA transcripts. In some cases, prior topurification, the RNA transcripts can be contacted with a DNase (e.g.,DNAase I) to break down residual DNA. At operation 987 the DNAtranscripts are then fragmented and ligated to additional functionalsequences, such as sequences 928, 932 and 930 and, in some cases,further amplified (e.g., via PCR). In some cases where sequence 908 is aP7 sequence and sequence 910 is a R2 primer binding site, sequence 930is a P5 sequence and sequence 928 is a R1 primer binding site andsequence 932 is an i5 sample index sequence. In some cases wheresequence 908 is a P5 sequence and sequence 910 is a R1 primer bindingsite, sequence 930 is a P7 sequence and sequence 928 is a R2 primerbinding site and sequence 932 is an i7 sample index sequence. In somecases, prior to removing a portion of the DNA transcripts, the DNAtranscripts can be contacted with an RNase to break down residual RNA.In some cases, operations 981 and 983 can occur in the partition, whileoperations 968, 970, 985 and 987 can occur in bulk (e.g., outside thepartition). In the case where a partition is a droplet in an emulsion,the emulsion can be broken and the contents of the droplet pooled inorder to complete operations 968, 970, 985 and 987.

Another example of a barcode oligonucleotide for use in RNA analysis,including messenger RNA (mRNA, including mRNA obtained from a cell)analysis is shown in FIG. 10. As shown, the overall oligonucleotide 1002is coupled to a bead 1004 by a releasable linkage 1006, such as adisulfide linker. The oligonucleotide may include functional sequencesthat are used in subsequent processing, such as functional sequence1008, which may include a sequencer specific flow cell attachmentsequence, e.g., a P7 sequence, as well as functional sequence 1010,which may include sequencing primer sequences, e.g., a R2 primer bindingsite. A barcode sequence 1012 is included within the structure for usein barcoding the sample RNA. An RNA specific (e.g., mRNA specific)priming sequence, such as poly-dT sequence 1014 may be included in theoligonucleotide structure. An anchoring sequence segment (not shown) maybe included to ensure that the poly-dT sequence hybridizes at thesequence end of the mRNA. An additional sequence segment 1016 may beprovided within the oligonucleotide sequence. This additional sequencecan provide a unique molecular sequence segment, as described elsewhereherein. An additional functional sequence 1020 may be included for invitro transcription, e.g., a T7 RNA polymerase promoter sequence. Aswill be appreciated, although shown as a single oligonucleotide tetheredto the surface of a bead, individual beads can include tens to hundredsof thousands or even millions of individual oligonucleotide molecules,where, as noted, the barcode segment can be constant or relativelyconstant for a given bead, but where the variable or unique sequencesegment will vary across an individual bead.

In an example method of cellular RNA analysis and in reference to FIG.10, a cell is co-partitioned along with a barcode bearing bead, andother reagents such as reverse transcriptase, reducing agent and dNTPsinto a partition (e.g., a droplet in an emulsion). In operation 1050,the cell is lysed while the barcoded oligonucleotides 1002 are released(e.g., via the action of the reducing agent) from the bead, and thepoly-dT segment 1014 of the released barcode oligonucleotide thenhybridizes to the poly-A tail of mRNA 1020. Next at operation 1052, thepoly-dT segment is then extended in a reverse transcription reactionusing the mRNA as template to produce a cDNA transcript 1022 of the mRNAand also includes each of the sequence segments 1020, 1008, 1012, 1010,1016, and 1014 of the barcode oligonucleotide. Within any givenpartition, all of the cDNA transcripts of the individual mRNA moleculeswill include a common barcode sequence segment 1012. However, byincluding the unique random N-mer sequence, the transcripts made fromdifferent mRNA molecules within a given partition will vary at thisunique sequence. As described elsewhere herein, this provides aquantitation feature that can be identifiable even following anysubsequent amplification of the contents of a given partition, e.g., thenumber of unique segments associated with a common barcode can beindicative of the quantity of mRNA originating from a single partition,and thus, a single cell. At operation 1054 a second strand issynthesized and at operation 1056 the T7 promoter sequence can be usedby T7 polymerase to produce RNA transcripts in in vitro transcription.At operation 1058 the transcripts are fragmented (e.g., sheared),ligated to additional functional sequences, and reverse transcribed. Thefunctional sequences may include a sequencer specific flow cellattachment sequence 1030, e.g., a P5 sequence, as well as functionalsequence 1028, which may include sequencing primers, e.g., a R1 primerbinding sequence, as well as functional sequence 1032, which may includea sample index, e.g., an i5 sample index sequence. At operation 1060 theRNA transcripts can be reverse transcribed to DNA, the DNA amplified(e.g., via PCR), and sequenced to identify the sequence of the cDNAtranscript of the mRNA, as well as to sequence the barcode segment andthe unique sequence segment. In some cases, operations 1050 and 1052 canoccur in the partition, while operations 1054, 1056, 1058 and 1060 canoccur in bulk (e.g., outside the partition). In the case where apartition is a droplet in an emulsion, the emulsion can be broken andthe contents of the droplet pooled in order to complete operations 1054,1056, 1058 and 1060.

In an alternative example of a barcode oligonucleotide for use in RNA(e.g., cellular RNA) analysis as shown in FIG. 10, functional sequence1008 may be a P5 sequence and functional sequence 1010 may be a R1primer binding site. Moreover, the functional sequence 1030 may be a P7sequence, functional sequence 1028 may be a R2 primer binding site, andfunctional sequence 1032 may be an i7 sample index sequence.

An additional example of a barcode oligonucleotide for use in RNAanalysis, including messenger RNA (mRNA, including mRNA obtained from acell) analysis is shown in FIG. 11. As shown, the overalloligonucleotide 1102 is coupled to a bead 1104 by a releasable linkage1106, such as a disulfide linker. The oligonucleotide may includefunctional sequences that are used in subsequent processing, such asfunctional sequence 1108, which may include a sequencer specific flowcell attachment sequence, e.g., a P5 sequence, as well as functionalsequence 1110, which may include sequencing primer sequences, e.g., a R1primer binding site. In some cases, sequence 1108 is a P7 sequence andsequence 1110 is a R2 primer binding site. A barcode sequence 1112 isincluded within the structure for use in barcoding the sample RNA. Anadditional sequence segment 1116 may be provided within theoligonucleotide sequence. In some cases, this additional sequence canprovide a unique molecular sequence segment, as described elsewhereherein. An additional sequence 1114 may be included to facilitatetemplate switching, e.g., polyG. As will be appreciated, although shownas a single oligonucleotide tethered to the surface of a bead,individual beads can include tens to hundreds of thousands or evenmillions of individual oligonucleotide molecules, where, as noted, thebarcode segment can be constant or relatively constant for a given bead,but where the variable or unique sequence segment will vary across anindividual bead.

In an example method of cellular mRNA analysis and in reference to FIG.11, a cell is co-partitioned along with a barcode bearing bead, poly-dTsequence, and other reagents such as reverse transcriptase, a reducingagent and dNTPs into a partition (e.g., a droplet in an emulsion). Inoperation 1150, the cell is lysed while the barcoded oligonucleotidesare released from the bead (e.g., via the action of the reducing agent)and the poly-dT sequence hybridizes to the poly-A tail of mRNA 1120released from the cell. Next, in operation 1152, the poly-dT sequence isthen extended in a reverse transcription reaction using the mRNA as atemplate to produce a cDNA transcript 1122 complementary to the mRNA.Terminal transferase activity of the reverse transcriptase can addadditional bases to the cDNA transcript (e.g., polyC). The additionalbases added to the cDNA transcript, e.g., polyC, can then to hybridizewith 1114 of the barcoded oligonucleotide. This can facilitate templateswitching and a sequence complementary to the barcode oligonucleotidecan be incorporated into the cDNA transcript. The transcripts can befurther processed (e.g., amplified, portions removed, additionalsequences added, etc.) and characterized as described elsewhere herein,e.g., by sequencing. The configuration of the constructs generated bysuch a method can help minimize (or avoid) sequencing of the poly-dTsequence during sequencing.

An additional example of a barcode oligonucleotide for use in RNAanalysis, including cellular RNA analysis is shown in FIG. 12A. Asshown, the overall oligonucleotide 1202 is coupled to a bead 1204 by areleasable linkage 1206, such as a disulfide linker. The oligonucleotidemay include functional sequences that are used in subsequent processing,such as functional sequence 1208, which may include a sequencer specificflow cell attachment sequence, e.g., a P5 sequence, as well asfunctional sequence 1210, which may include sequencing primer sequences,e.g., a R1 primer binding site. In some cases, sequence 1208 is a P7sequence and sequence 1210 is a R2 primer binding site. A barcodesequence 1212 is included within the structure for use in barcoding thesample RNA. An additional sequence segment 1216 may be provided withinthe oligonucleotide sequence. In some cases, this additional sequencecan provide a unique molecular sequence segment, as described elsewhereherein. As will be appreciated, although shown as a singleoligonucleotide tethered to the surface of a bead, individual beads caninclude tens to hundreds of thousands or even millions of individualoligonucleotide molecules, where, as noted, the barcode segment can beconstant or relatively constant for a given bead, but where the variableor unique sequence segment will vary across an individual bead. In anexample method of cellular RNA analysis using this barcode, a cell isco-partitioned along with a barcode bearing bead and other reagents suchas RNA ligase and a reducing agent into a partition (e.g. a droplet inan emulsion). The cell is lysed while the barcoded oligonucleotides arereleased (e.g., via the action of the reducing agent) from the bead. Thebarcoded oligonucleotides can then be ligated to the 5′ end of mRNAtranscripts while in the partitions by RNA ligase. Subsequent operationsmay include purification (e.g., via solid phase reversibleimmobilization (SPRI)) and further processing (shearing, ligation offunctional sequences, and subsequent amplification (e.g., via PCR)), andthese operations may occur in bulk (e.g., outside the partition). In thecase where a partition is a droplet in an emulsion, the emulsion can bebroken and the contents of the droplet pooled for the additionaloperations.

An additional example of a barcode oligonucleotide for use in RNAanalysis, including cellular RNA analysis is shown in FIG. 12B. Asshown, the overall oligonucleotide 1222 is coupled to a bead 1224 by areleasable linkage 1226, such as a disulfide linker. The oligonucleotidemay include functional sequences that are used in subsequent processing,such as functional sequence 1228, which may include a sequencer specificflow cell attachment sequence, e.g., a P5 sequence, as well asfunctional sequence 1230, which may include sequencing primer sequences,e.g., a R1 primer binding site. In some cases, sequence 1228 is a P7sequence and sequence 1230 is a R2 primer binding site. A barcodesequence 1232 is included within the structure for use in barcoding thesample RNA. A priming sequence 1234 (e.g., a random priming sequence)can also be included in the oligonucleotide structure, e.g., a randomhexamer. An additional sequence segment 1236 may be provided within theoligonucleotide sequence. In some cases, this additional sequenceprovides a unique molecular sequence segment, as described elsewhereherein. As will be appreciated, although shown as a singleoligonucleotide tethered to the surface of a bead, individual beads caninclude tens to hundreds of thousands or even millions of individualoligonucleotide molecules, where, as noted, the barcode segment can beconstant or relatively constant for a given bead, but where the variableor unique sequence segment will vary across an individual bead. In anexample method of cellular mRNA analysis using the barcodeoligonucleotide of FIG. 12B, a cell is co-partitioned along with abarcode bearing bead and additional reagents such as reversetranscriptase, a reducing agent and dNTPs into a partition (e.g., adroplet in an emulsion). The cell is lysed while the barcodedoligonucleotides are released from the bead (e.g., via the action of thereducing agent). In some cases, sequence 1228 is a P7 sequence andsequence 1230 is a R2 primer binding site. In other cases, sequence 1228is a P5 sequence and sequence 1230 is a R1 primer binding site. Thepriming sequence 1234 of random hexamers can randomly hybridize cellularmRNA. The random hexamer sequence can then be extended in a reversetranscription reaction using mRNA from the cell as a template to producea cDNA transcript complementary to the mRNA and also includes each ofthe sequence segments 1228, 1232, 1230, 1236, and 1234 of the barcodeoligonucleotide. Subsequent operations can include generation ofamplification products, purification (e.g., via solid phase reversibleimmobilization (SPRI)), further processing (e.g., shearing, ligation offunctional sequences, and subsequent amplification (e.g., via PCR)).These operations may occur in bulk (e.g., outside the partition). In thecase where a partition is a droplet in an emulsion, the emulsion can bebroken and the contents of the droplet pooled for additional operations.Additional reagents that may be co-partitioned along with the barcodebearing bead may include oligonucleotides to block ribosomal RNA (rRNA)and nucleases to digest genomic DNA from cells. Alternatively, rRNAremoval agents may be applied during additional processing operations.The configuration of the constructs generated by such a method can helpminimize (or avoid) sequencing of the poly-dT sequence during sequencingand/or sequence the 5′ end of a polynucleotide sequence. Theamplification products, for example, first amplification products and/orsecond amplification products, may be subject to sequencing for sequenceanalysis. In some cases, amplification may be performed using thePartial Hairpin Amplification for Sequencing (PHASE) method. The singlecell analysis methods described herein may also be useful in theanalysis of the whole transcriptome. Referring back to the barcode ofFIG. 12B, the priming sequence 1234 may be a random N-mer. In somecases, sequence 1228 is a P7 sequence and sequence 1230 is a R2 primerbinding site. In other cases, sequence 1228 is a P5 sequence andsequence 1230 is a R1 primer binding site. In an example method of wholetranscriptome analysis using this barcode, the individual cell isco-partitioned along with a barcode bearing bead, poly-dT sequence, andother reagents such as reverse transcriptase, polymerase, a reducingagent and dNTPs into a partition (e.g., droplet in an emulsion). In anoperation of this method, the cell is lysed while the barcodedoligonucleotides are released from the bead (e.g., via the action of thereducing agent) and the poly-dT sequence hybridizes to the poly-A tailof cellular mRNA. In a reverse transcription reaction using the mRNA astemplate, cDNA transcripts of cellular mRNA can be produced. The RNA canthen be degraded with an RNase. The priming sequence 1234 in thebarcoded oligonucleotide can then randomly hybridize to the cDNAtranscripts. The oligonucleotides can be extended using polymeraseenzymes and other extension reagents co-partitioned with the bead andcell similar to as shown in FIG. 3 to generate amplification products(e.g., barcoded fragments), similar to the example amplification productshown in FIG. 3 (panel F). The barcoded nucleic acid fragments may, insome cases subjected to further processing (e.g., amplification,addition of additional sequences, clean up processes, etc. as describedelsewhere herein) characterized, e.g., through sequence analysis. Inthis operation, sequencing signals can come from full length RNA.

Although operations with various barcode designs have been discussedindividually, individual beads can include barcode oligonucleotides ofvarious designs for simultaneous use.

In addition to characterizing individual cells or cell sub-populationsfrom larger populations, the processes and systems described herein mayalso be used to characterize individual cells as a way to provide anoverall profile of a cellular, or other organismal population. A varietyof applications require the evaluation of the presence andquantification of different cell or organism types within a populationof cells, including, for example, microbiome analysis andcharacterization, environmental testing, food safety testing,epidemiological analysis, e.g., in tracing contamination or the like. Inparticular, the analysis processes described above may be used toindividually characterize, sequence and/or identify large numbers ofindividual cells within a population. This characterization may then beused to assemble an overall profile of the originating population, whichcan provide important prognostic and diagnostic information.

For example, shifts in human microbiomes, including, e.g., gut, buccal,epidermal microbiomes, etc., have been identified as being bothdiagnostic and prognostic of different conditions or general states ofhealth. Using the single cell analysis methods and systems describedherein, one can again, characterize, sequence and identify individualcells in an overall population, and identify shifts within thatpopulation that may be indicative of diagnostic ally relevant factors.By way of example, sequencing of bacterial 16S ribosomal RNA genes hasbeen used as a highly accurate method for taxonomic classification ofbacteria. Using the targeted amplification and sequencing processesdescribed above can provide identification of individual cells within apopulation of cells. One may further quantify the numbers of differentcells within a population to identify current states or shifts in statesover time. See, e.g., Morgan et al, PLoS Comput. Biol., Ch. 12, December2012, 8(12):e1002808, and Ram et al., Syst. Biol. Reprod. Med., June2011, 57(3):162-170, each of which is incorporated herein by referencein its entirety for all purposes. Likewise, identification and diagnosisof infection or potential infection may also benefit from the singlecell analyses described herein, e.g., to identify microbial speciespresent in large mixes of other cells or other biological material,cells and/or nucleic acids, including the environments described above,as well as any other diagnostically relevant environments, e.g.,cerebrospinal fluid, blood, fecal or intestinal samples, or the like.

The foregoing analyses may also be particularly useful in thecharacterization of potential drug resistance of different cells, e.g.,cancer cells, bacterial pathogens, etc., through the analysis ofdistribution and profiling of different resistance markers/mutationsacross cell populations in a given sample. Additionally,characterization of shifts in these markers/mutations across populationsof cells over time can provide valuable insight into the progression,alteration, prevention, and treatment of a variety of diseasescharacterized by such drug resistance issues.

Although described in terms of cells, it will be appreciated that any ofa variety of individual biological organisms, or components of organismsare encompassed within this description, including, for example, cells,viruses, organelles, cellular inclusions, vesicles, or the like.Additionally, where referring to cells, it will be appreciated that suchreference includes any type of cell, including without limitationprokaryotic cells, eukaryotic cells, bacterial, fungal, plant,mammalian, or other animal cell types, mycoplasmas, normal tissue cells,tumor cells, or any other cell type, whether derived from single cell ormulticellular organisms.

Similarly, analysis of different environmental samples to profile themicrobial organisms, viruses, or other biological contaminants that arepresent within such samples, can provide important information aboutdisease epidemiology, and potentially aid in forecasting diseaseoutbreaks, epidemics an pandemics.

As described above, the methods, systems and compositions describedherein may also be used for analysis and characterization of otheraspects of individual cells or populations of cells. In one exampleprocess, a sample is provided that contains cells that are to beanalyzed and characterized as to their cell surface proteins. Alsoprovided is a library of antibodies, antibody fragments, or othermolecules having a binding affinity to the cell surface proteins orantigens (or other cell features) for which the cell is to becharacterized (also referred to herein as cell surface feature bindinggroups). For ease of discussion, these affinity groups are referred toherein as binding groups. The binding groups can include a reportermolecule that is indicative of the cell surface feature to which thebinding group binds. In particular, a binding group type that isspecific to one type of cell surface feature will comprise a firstreporter molecule, while a binding group type that is specific to adifferent cell surface feature will have a different reporter moleculeassociated with it. In some aspects, these reporter molecules willcomprise oligonucleotide sequences. Oligonucleotide based reportermolecules provide advantages of being able to generate significantdiversity in terms of sequence, while also being readily attachable tomost biomolecules, e.g., antibodies, etc., as well as being readilydetected, e.g., using sequencing or array technologies. In the exampleprocess, the binding groups include oligonucleotides attached to them.Thus, a first binding group type, e.g., antibodies to a first type ofcell surface feature, will have associated with it a reporteroligonucleotide that has a first nucleotide sequence. Different bindinggroup types, e.g., antibodies having binding affinity for other,different cell surface features, will have associated therewith reporteroligonucleotides that comprise different nucleotide sequences, e.g.,having a partially or completely different nucleotide sequence. In somecases, for each type of cell surface feature binding group, e.g.,antibody or antibody fragment, the reporter oligonucleotide sequence maybe known and readily identifiable as being associated with the knowncell surface feature binding group. These oligonucleotides may bedirectly coupled to the binding group, or they may be attached to abead, molecular lattice, e.g., a linear, globular, cross-slinked, orother polymer, or other framework that is attached or otherwiseassociated with the binding group, which allows attachment of multiplereporter oligonucleotides to a single binding group.

In the case of multiple reporter molecules coupled to a single bindinggroup, such reporter molecules can comprise the same sequence, or aparticular binding group will include a known set of reporteroligonucleotide sequences. As between different binding groups, e.g.,specific for different cell surface features, the reporter molecules canbe different and attributable to the particular binding group.

Attachment of the reporter groups to the binding groups may be achievedthrough any of a variety of direct or indirect, covalent or non-covalentassociations or attachments. For example, in the case of oligonucleotidereporter groups associated with antibody based binding groups, sucholigonucleotides may be covalently attached to a portion of an antibodyor antibody fragment using chemical conjugation techniques (e.g.,Lightning-Link® antibody labeling kits available from InnovaBiosciences), as well as other non-covalent attachment mechanisms, e.g.,using biotinylated antibodies and oligonucleotides (or beads thatinclude one or more biotinylated linker, coupled to oligonucleotides)with an avidin or streptavidin linker. Antibody and oligonucleotidebiotinylation techniques are available (See, e.g., Fang, et al.,Fluoride-Cleavable Biotinylation Phosphoramidite for 5′-end-Labeling andAffinity Purification of Synthetic Oligonucleotides, Nucleic Acids Res.Jan. 15, 2003; 31(2):708-715; DNA 3′ End Biotinylation Kit, availablefrom Thermo Scientific; and the SiteClick™ Antibody Labeling Systemavailable from Thermo Fisher Scientific, the full disclosures of whichare incorporated herein by reference in their entirety for allpurposes). Likewise, protein and peptide biotinylation techniques havebeen developed and are readily available (See, e.g., U.S. Pat. No.6,265,552, the full disclosures of which are incorporated herein byreference in their entirety for all purposes).

The reporter oligonucleotides may be provided having any of a range ofdifferent lengths, depending upon the diversity of reporter moleculesdesired or a given analysis, the sequence detection scheme employed, andthe like. In some cases, these reporter sequences can be greater thanabout 5 nucleotides in length, greater than about 10 nucleotides inlength, greater than about 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 150or even 200 nucleotides in length. In some cases, these reporternucleotides may be less than about 250 nucleotides in length, less thanabout 200, 180, 150, 120 100, 90, 80, 70, 60, 50, 40, or even 30nucleotides in length. In many cases, the reporter oligonucleotides maybe selected to provide barcoded products that are already sized, andotherwise configured to be analyzed on a sequencing system. For example,these sequences may be provided at a length that ideally createssequenceable products of a desired length for particular sequencingsystems. Likewise, these reporter oligonucleotides may includeadditional sequence elements, in addition to the reporter sequence, suchas sequencer attachment sequences, sequencing primer sequences,amplification primer sequences, or the complements to any of these.

In operation, a cell-containing sample is incubated with the bindingmolecules and their associated reporter oligonucleotides, for any of thecell surface features desired to be analyzed. Following incubation, thecells are washed to remove unbound binding groups. Following washing,the cells are partitioned into separate partitions, e.g., droplets,along with the barcode carrying beads described above, where eachpartition includes a limited number of cells, e.g., in some cases, asingle cell. Upon releasing the barcodes from the beads, they will primethe amplification and barcoding of the reporter oligonucleotides. Asnoted above, the barcoded replicates of the reporter molecules mayadditionally include functional sequences, such as primer sequences,attachment sequences or the like.

The barcoded reporter oligonucleotides are then subjected to sequenceanalysis to identify which reporter oligonucleotides bound to the cellswithin the partitions. Further, by also sequencing the associatedbarcode sequence, one can identify that a given cell surface featurelikely came from the same cell as other, different cell surfacefeatures, whose reporter sequences include the same barcode sequence,i.e., they were derived from the same partition. Based upon the reportermolecules that emanate from an individual partition based upon thepresence of the barcode sequence, one may then create a cell surfaceprofile of individual cells from a population of cells. Profiles ofindividual cells or populations of cells may be compared to profilesfrom other cells, e.g., ‘normal’ cells, to identify variations in cellsurface features, which may provide diagnostically relevant information.In particular, these profiles may be particularly useful in thediagnosis of a variety of disorders that are characterized by variationsin cell surface receptors, such as cancer and other disorders.

IV. Engineered Reverse Transcription Enzyme Variants

One of the major challenges in cDNA synthesis reactions are interferencein cDNA synthesis from RNA secondary structures. While a higher reactiontemperature can remove secondary structure from the template RNA,elevated temperatures typically lead to lower reverse-transcriptase (RT)enzyme activity without the use of an efficient, thermostable RT enzyme.Additionally, RT enzyme activity can be reduced by inhibitors, such ascell lysates and associated reagents.

Wild-type (WT) Moloney Murine Leukemia Virus (MMLV)reverse-transcriptase is an RT enzyme that is typically inactivated athigher temperatures. However, several commercially available mutant MMLVRT enzymes exhibit improved thermostability, fidelity, substrateaffinity, and/or reduced terminal deoxynucleotidyltransferase activity.

Disclosed herein, in some embodiments, are engineered reversetranscription enzymes, comprising an amino acid sequence that is atleast 80% identical to SEQ ID NO: 3, wherein said amino acid sequencecomprises: (a) a truncation of at least 15 amino acids from theN-terminus relative to SEQ ID NO: 3; and (b) one or more mutationsselected from the group consisting of an E69 mutation, an L139 mutation,a D200 mutation, an E302 mutation, a T306 mutation, a W313 mutation, aT330 mutation, an L435 mutation, a P448 mutation, a D449 mutation, anN454 mutation, a D524 mutation, an L603 mutation, and an E607 mutationrelative to SEQ ID NO: 3. In some instances, the one or more mutationsin (b) are selected from the group consisting of an E69K mutation, anL139P mutation, a D200N mutation, an E302R mutation, a T306K mutation, aW313F mutation, a T330P mutation, an L435G or L435K mutation, a P448Amutation, a D449G mutation, an N454K mutation, a D524N or D524Amutation, an L603W mutation, and an E607K mutation relative to SEQ IDNO: 3.

In some embodiments, engineered reverse transcription enzymes furthercomprise an affinity tag at said N-terminus or at a C-terminus of saidamino acid sequence. In some instances, said affinity tag include, butare not limited to, albumin binding protein (ABP), AU1 epitope, AU5epitope, T7-tag, V5-tag, B-tag, Chloramphenicol Acetyl Transferase(CAT), Dihydrofolate reductase (DHFR), AviTag, Calmodulin-tag,polyglutamate tag, E-tag, FLAG-tag, HA-tag, Myc-tag, NE-tag, S-tag,SBP-tag, Doftag 1, Softag 3, Spot-tag, tetracysteine (TC) tag, Ty tag,VSV-tag, Xpress tag, biotin carboxyl carrier protein (BCCP), greenfluorescent protein tag, HaloTag, Nus-tag, thioredoxin-tag, Fc-tag,cellulose binding domain, chitin binding protein (CBP), choline-bindingdomain, galactose binding domain, maltose binding protein (MBP),Horseradish Peroxidase (HRP), Strep-tag, HSV epitope, Ketosteroidisomerase (KSI), KT3 epitope, LacZ, Luciferase, PDZ domain, PDZ ligand,Polyarginine (Arg-tag), Polyaspartate (Asp-tag), Polycysteine (Cys-tag),Polyphenylalanine (Phe-tag), Profinity eXact, Protein C, S1-tag, S1-tag,Staphylococcal protein A (Protein A), Staphylococcal protein G (ProteinG), Small Ubiquitin-like Modifier (SUMO), Tandem Affinity Purification(TAP), TrpE, Ubiquitin, Universal, glutathione-S-transferase (GST), andpoly(His) tag. In some instances, said affinity tag is at least 5histidine amino acids.

In some embodiments, engineered reverse transcription enzymes furthercomprises a protease cleavage sequence, wherein cleavage of saidprotease cleavage sequence by a protease results in cleavage of saidaffinity tag from said engineered reverse transcription enzyme. In someinstances, protease cleavage sequence is the protease cleavage sequencerecognized by a protease including, but not limited to, alaninecarboxypeptidase, Armillaria mellea astacin, bacterial leucylaminopeptidase, cancer procoagulant, cathepsin B, clostripain, cytosolalanyl aminopeptidase, elastase, endoproteinase Arg-C, enterokinase,gastricsin, gelatinase, Gly-X carboxypeptidase, glycyl endopeptidase,human rhinovirus 3C protease, hypodermin C, Iga-specific serineendopeptidase, leucyl aminopeptidase, leucyl endopeptidase, lysC,lysosomal pro-X carboxypeptidase, lysyl aminopeptidase, methionylaminopeptidase, myxobacter, nardilysin, pancreatic endopeptidase E,picornain 2A, picornain 3C, proendopeptidase, prolyl aminopeptidase,proprotein convertase I, proprotein convertase II, russellysin,saccharopepsin, semenogelase, T-plasminogen activator, thrombin, tissuekallikrein, tobacco etch virus (TEV), togavirin, tryptophanylaminopeptidase, U-plasminogen activator, V8, venombin A, venombin AB,and Xaa-pro aminopeptidase. In some instances, said protease cleavagesequence is a thrombin cleavage sequence.

Disclosed herein, in some embodiments, are engineered reversetranscription enzyme variants, comprising an amino acid sequence that isat least 80% identical to the amino acid (polypeptide) sequence of SEQID NO: 3, wherein said amino acid sequence is characterized by two ormore of: (i) a truncation of at least 15 amino acids from an N terminusof said amino acid sequence; (ii) a sequence of at least 5 histidineamino acids at said N terminus of said amino acid sequence; (iii) athrombin cleavage recognition site; and (iv) one or more mutationsselected from the group consisting of an E69 mutation, an L139 mutation,a D200 mutation, an E302 mutation, a T306 mutation, a W313 mutation, aT330 mutation, an L435 mutation, a P448 mutation, a D449 mutation, anN454 mutation, a D524 mutation, an L603 mutation, and an E607 mutationrelative to SEQ ID NO: 3. Disclosed herein, in some embodiments, areengineered reverse transcription enzyme variants, comprising an aminoacid sequence that is at least 80% identical to the amino acid(polypeptide) sequence of SEQ ID NO: 3, wherein said amino acid sequenceis characterized by two or more of: (i) a truncation of at least 15amino acids from an N terminus of said amino acid sequence; (ii) asequence of at least 5 histidine amino acids at said N terminus of saidamino acid sequence; (iii) a thrombin cleavage recognition site; and(iv) one or more mutations selected from the group consisting of an E69Kmutation, an L139P mutation, a D200N mutation, an E302R mutation, aT306K mutation, a W313F mutation, a T330P mutation, an L435G or L435Kmutation, a P448A mutation, a D449G mutation, an N454K mutation, a D524Nor D524A mutation, an L603W mutation, and an E607K mutation relative toSEQ ID NO: 3.

Wild-type MMLV expresses a 1738-amino acid polypeptide chain (see, e.g.,UniProt P03355) which is processed by viral protease p14 into a numberof mature proteins, including the wild-type MMLV p80 reversetranscriptase enzyme (see, e.g., SEQ ID NO: 3).

TABLE 1 Sequences SEQ ID NO: 1 ATGGGTAGCTCACATCACCATCATCATCATTCTTCTGGTCTGGTCCCACGCGGCAGCACTTGGCTGTCTGATTTCCCTCAGGCGTGGGCCGAAACGGGTGGCATGGGTCTGGCAGTGCGTCAGGCACCGCTGATTATTCCGCTGAAAGCGACGTCGACCCCGGTGAGCATCAAGCAATATCCGATGTCCCAAGAGGCGCGCTTAGGTATTAAGCCGCACATTCAGCGTCTGCTGGATCAAGGTATTCTGGTTCCGTGTCAGAGCCCGTGGAATACCCCGCTTCTCCCGGTGAAGAAACCGGGCACGAACGATTACCGTCCAGTCCAAGACTTGCGCGAAGTTAACAAGCGCGTTGAAGATATTCACCCGACCGTCCCGAACCCGTACAATCTGCTGAGCGGTCTGCCGCCAAGCCACCAATGGTACACCGTGCTGGATCTGAAAGATGCTTTCTTCTGTCTGCGTCTGCACCCAACCAGCCAGCCTCTGTTTGCATTTGAGTGGCGTGACCCTGAGATGGGTATTAGCGGCCAGCTGACGTGGACCCGCCTGCCGCAAGGTTTTAAGAATTCCCCTACGCTGTTTGACGAAGCGCTGCACCGTGACCTGGCGGATTTCCGTATCCAGCACCCGGACCTGATCTTGCTGCAGTACGTTGATGACCTGTTGCTGGCGGCGACGAGCGAGCTGGATTGCCAACAGGGCACCCGTGCGCTGTTGCAGACCTTGGGTAACCTGGGTTATCGCGCTAGCGCGAAGAAAGCGCAGATTTGCCAAAAACAAGTTAAGTATCTGGGCTACCTGTTAAAGGAAGGCCAACGTTGGCTGACCGAAGCCCGCAAAGAAACTGTCATGGGTCAGCCGACCCCGAAAACGCCACGCCAACTGCGTGAGTTCTTGGGCACCGCGGGTTTCTGCCGCCTGTGGATCCCGGGCTTTGCCGAAATGGCAGCCCCGCTGTATCCGTTGACCAAGACCGGCACCCTGTTCAACTGGGGTCCGGACCAGCAGAAAGCGTACCAAGAAATTAAACAAGCACTGCTGACGGCACCGGCGCTGGGTCTGCCGGACCTGACCAAGCCGTTTGAGCTGTTCGTGGATGAGAAGCAAGGTTACGCGAAGGGCGTGTTGACCCAGAAATTGGGTCCGTGGCGTCGTCCGGTTGCATACCTGTCCAAGAAACTGGACCCGGTTGCTGCTGGTTGGCCGCCTTGCCTGCGCATGGTTGCCGCTATCGCGGTGCTGACTAAAGACGCGGGTAAGCTGACGATGGGTCAACCGCTGGTGATCAAGGCACCGCATGCAGTCGAGGCCCTTGTTAAGCAACCGCCAGATAGATGGCTGAGCAACGCGCGTATGACGCATTACCAGGCACTGCTGTTGGACACCGATCGTGTGCAGTTTGGCCCGGTCGTTGCGCTCAACCCGGCGACCCTGCTGCCGCTCCCGGAAGAAGGCTTGCAGCACAACTGTTTGGACATCCTGGCAGAGGCGCACGGCACTCGCCCGGATCTGACGGACCAGCCGCTGCCGGACGCCGATCATACCTGGTATACGAATGGTAGCAGCCTGTTGCAAGAGGGTCAGCGTAAGGCCGGTGCCGCGGTCACCACCGAGACTGAAGTGATTTGGGCTAAAGCATTGCCTGCGGGTACCAGCGCGCAGCGTGCCGAGCTGATCGCACTGACCCAAGCGCTGAAAATGGCTGAGGGTAAGAAACTGAATGTGTACACGGATAGCCGTTATGCCTTTGCGACCGCCCACATTCACGGCGAGATCTATCGCCGTCGCGGCCTGCTGACGTCCGAAGGCAAAGAGATCAAGAATAAAGACGAAATTCTGGCGCTGCTGAAAGCGCTGTTCCTGCCGAAACGTCTGTCGATCATCCATTGCCCGGGTCACCAGAAAGGCCACAGCGCAGAGGCGCGTGGTAATCGCATGGCTGACCAGGCTGCGCGTAAAGCCGCAATTACCGAAACCCCGGACACCAGCACGCTGCTGATCGAGAATAGCAGCCCGAACAGCCGTCTGATC AATTGATAA SEQ ID NO: 2MGSSHHHHHHSSGLVPRGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFDEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVIKAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTNGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTST LLIENSSPNSRLIN SEQ ID NO: 3TLNIEDEHRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFDEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSA EARGNRMADQAARKAAITETPDTSTLLSEQ ID NO: 4 ATGGGTAGCTCACATCACCATCATCATCATTCTTCTGGTCTGGTCCCACGCGGCAGCACTTGGCTGTCTGATTTCCCTCAGGCGTGGGCCGAAACGGGTGGCATGGGTCTGGCAGTGCGTCAGGCACCGCTGATTATTCCGCTGAAAGCGACGTCGACCCCGGTGAGCATCAAGCAATATCCGATGTCCCAAAAGGCGCGCTTAGGTATTAAGCCGCACATTCAGCGTCTGCTGGATCAAGGTATTCTGGTTCCGTGTCAGAGCCCGTGGAATACCCCGCTTCTCCCGGTGAAGAAACCGGGCACGAACGATTACCGTCCAGTCCAAGACTTGCGCGAAGTTAACAAGCGCGTTGAAGATATTCACCCGACCGTCCCGAACCCGTACAATCTGCTGAGCGGTCCGCCGCCAAGCCACCAATGGTACACCGTGCTGGATCTGAAAGATGCTTTCTTCTGTCTGCGTCTGCACCCAACCAGCCAGCCTCTGTTTGCATTTGAGTGGCGTGACCCTGAGATGGGTATTAGCGGCCAGCTGACGTGGACCCGCCTGCCGCAAGGTTTTAAGAATTCCCCTACGCTGTTTAACGAAGCGCTGCACCGTGACCTGGCGGATTTCCGTATCCAGCACCCGGACCTGATCTTGCTGCAGTACGTTGATGACCTGTTGCTGGCGGCGACGAGCGAGCTGGATTGCCAACAGGGCACCCGTGCGCTGTTGCAGACCTTGGGTAACCTGGGTTATCGCGCTAGCGCGAAGAAAGCGCAGATTTGCCAAAAACAAGTTAAGTATCTGGGCTACCTGTTAAAGGAAGGCCAACGTTGGCTGACCGAAGCCCGCAAAGAAACTGTCATGGGTCAGCCGACCCCGAAAACGCCACGCCAACTGCGTAGGTTCTTGGGCAAAGCGGGTTTCTGCCGCCTGTTCATCCCGGGCTTTGCCGAAATGGCAGCCCCGCTGTATCCGTTGACCAAGCCGGGCACCCTGTTCAACTGGGGTCCGGACCAGCAGAAAGCGTACCAAGAAATTAAACAAGCACTGCTGACGGCACCGGCGCTGGGTCTGCCGGACCTGACCAAGCCGTTTGAGCTGTTCGTGGATGAGAAGCAAGGTTACGCGAAGGGCGTGTTGACCCAGAAATTGGGTCCGTGGCGTCGTCCGGTTGCATACCTGTCCAAGAAACTGGACCCGGTTGCTGCTGGTTGGCCGCCTTGCCTGCGCATGGTTGCCGCTATCGCGGTGCTGACTAAAGACGCGGGTAAGCTGACGATGGGTCAACCGCTGGTGATCAAGGCACCGCATGCAGTCGAGGCCCTTGTTAAGCAACCGGCAGGCAGATGGCTGAGCAAGGCGCGTATGACGCATTACCAGGCACTGCTGTTGGACACCGATCGTGTGCAGTTTGGCCCGGTCGTTGCGCTCAACCCGGCGACCCTGCTGCCGCTCCCGGAAGAAGGCTTGCAGCACAACTGTTTGGACATCCTGGCAGAGGCGCACGGCACTCGCCCGGATCTGACGGACCAGCCGCTGCCGGACGCCGATCATACCTGGTATACGAATGGTAGCAGCCTGTTGCAAGAGGGTCAGCGTAAGGCCGGTGCCGCGGTCACCACCGAGACTGAAGTGATTTGGGCTAAAGCATTGCCTGCGGGTACCAGCGCGCAGCGTGCCGAGCTGATCGCACTGACCCAAGCGCTGAAAATGGCTGAGGGTAAGAAACTGAATGTGTACACGGATAGCCGTTATGCCTTTGCGACCGCCCACATTCACGGCGAGATCTATCGCCGTCGCGGCTGGCTGACGTCCAAAGGCAAAGAGATCAAGAATAAAGACGAAATTCTGGCGCTGCTGAAAGCGCTGTTCCTGCCGAAACGTCTGTCGATCATCCATTGCCCGGGTCACCAGAAAGGCCACAGCGCAGAGGCGCGTGGTAATCGCATGGCTGACCAGGCTGCGCGTAAAGCCGCAATTACCGAAACCCCGGACACCAGCACGCTGCTGATCGAGAATAGCAGCCCGAACAGCCGTCTGAT CAATTGATAA SEQ ID NO: 5MGSSHHHHHHSSGLVPRGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQKARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGPPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLRRFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVIKAPHAVEALVKQPAGRWLSKARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTNGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSKGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTST LLIENSSPNSRLIN SEQ ID NO: 6TWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQKARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGPPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLRRFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVIKAPHAVEALVKQPAGRWLSKARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTNGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSKGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPD TSTLL

As used herein, the term “variant” may have at least about 45%, at leastabout 50%, at least about 55%, at least about 60%, at least about 65%,at least about 70%, at least about 75%, at least about 80%, at leastabout 85%, at least about 88%, at least about 90%, at least about 91%,at least about 92%, at least about 93%, at least about 94%, at leastabout 95%, at least about 96%, at least about 97%, at least about 98%,at least about 99%, or at least about 99.5% sequence identity to apolypeptide sequence when optimally aligned for comparison.

As used herein, a polypeptide having a certain percent (e.g., at least80%, at least 85%, at least 90%, at least 95%, or at least 99%) ofsequence identity with another sequence means that, when aligned, thatpercentage of bases or amino acid residues are the same in comparing thetwo sequences. This alignment and the percent homology or identity canbe determined using any suitable software program known in the art, forexample those described in CURRENT PROTOCOLS IN MOLECULAR BIOLOGY,Ausubel et al., eds., 1987, Supplement 30, section 7.7.18.Representative programs include the Vector NTI Advance™ 9.0 (InvitrogenCorp. Carlsbad, Calif.), GCG Pileup, FASTA (Pearson et al. (1988) Proc.Natl Acad. ScL USA 85:2444-2448), and BLAST (BLAST Manual, Altschul etal., Nat'l Cent. Biotechnol. Inf., Nat'l Lib. Med. (NCIB NLM NIH),Bethesda, Md., and Altschul et al., (1997) Nucleic Acids Res.25:3389-3402) programs. Another typical alignment program is ALIGN Plus(Scientific and Educational Software, PA), generally using defaultparameters. Another sequence software program that finds use is theTFASTA Data Searching Program available in the Sequence Software PackageVersion 6.0 (Genetics Computer Group, University of Wisconsin, Madison,Wis.).

In some embodiments, the engineered reverse transcription enzymecomprises an amino acid sequence that is at least 80% identical to a WTMMLV RT enzyme (SEQ ID NO: 3). In some embodiments, the engineeredreverse transcription enzyme comprises a nucleotide sequence that is atleast 80% identical to a nucleotide sequence according to SEQ ID NO: 1.In some embodiments, the engineered reverse transcription enzymecomprises an amino acid sequence that is at least 80% identical to SEQID NO: 2. In some embodiments, the engineered reverse transcriptionenzyme comprises a nucleotide sequence that is at least 80%, at least81%, at least 82%, at least 83%, at least 84%, at least 85%, at least86%, at least 87%, at least 88%, at least 89%, at least 90%, at least91%, at least 92%, at least 93%, at least 94%, at least 95%, at least96%, at least 97%, at least 98%, or at least 99% identical to anucleotide sequence according to SEQ ID NO. 1. In some embodiments, theengineered reverse transcription enzyme comprises an amino acid sequencethat is at least 80%, at least 81%, at least 82%, at least 83%, at least84%, at least 85%, at least 86%, at least 87%, at least 88%, at least89%, at least 90%, at least 91%, at least 92%, at least 93%, at least94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least99% identical to SEQ ID NO. 2. In some embodiments, the engineeredreverse transcription enzyme comprises an amino acid sequence that is atleast 80%, at least 81%, at least 82%, at least 83%, at least 84%, atleast 85%, at least 86%, at least 87%, at least 88%, at least 89%, atleast 90%, at least 91%, at least 92%, at least 93%, at least 94%, atleast 95%, at least 96%, at least 97%, at least 98%, or at least 99%identical to SEQ ID NO. 3.

In some embodiments, the engineered reverse transcription enzymecomprises an amino acid sequence that is at least 80% identical to a WTMMLV RT enzyme (SEQ ID NO: 3) over a span of at least 150 amino acidresidues. In some embodiments, the engineered reverse transcriptionenzyme comprises a nucleotide sequence that is at least 80% identical toa nucleotide sequence according to SEQ ID NO: 1 over a span of at least450 nucleotides. In some embodiments, the engineered reversetranscription enzyme comprises an amino acid sequence that is at least80% identical to SEQ ID NO: 2 over a span of at least 150 amino acidresidues. In some embodiments, the engineered reverse transcriptionenzyme comprises a nucleotide sequence that is at least 80%, at least81%, at least 82%, at least 83%, at least 84%, at least 85%, at least86%, at least 87%, at least 88%, at least 89%, at least 90%, at least91%, at least 92%, at least 93%, at least 94%, at least 95%, at least96%, at least 97%, at least 98%, or at least 99% identical to anucleotide sequence according to SEQ ID NO. 1 over a span of at least450 nucleotides. In some embodiments, the engineered reversetranscription enzyme comprises an amino acid sequence that is at least80%, at least 81%, at least 82%, at least 83%, at least 84%, at least85%, at least 86%, at least 87%, at least 88%, at least 89%, at least90%, at least 91%, at least 92%, at least 93%, at least 94%, at least95%, at least 96%, at least 97%, at least 98%, or at least 99% identicalto SEQ ID NO. 2 over a span of at least 150 amino acid residues. In someembodiments, the engineered reverse transcription enzyme comprises anamino acid sequence that is at least 80%, at least 81%, at least 82%, atleast 83%, at least 84%, at least 85%, at least 86%, at least 87%, atleast 88%, at least 89%, at least 90%, at least 91%, at least 92%, atleast 93%, at least 94%, at least 95%, at least 96%, at least 97%, atleast 98%, or at least 99% identical to SEQ ID NO. 3 over a span of atleast 150 amino acid residues.

In some embodiments, the engineered reverse transcription enzymecomprises an amino acid sequence that is at least 80% identical to a WTMMLV RT enzyme (SEQ ID NO: 3) over a span of at least 300 amino acidresidues. In some embodiments, the engineered reverse transcriptionenzyme comprises a nucleotide sequence that is at least 80% identical toa nucleotide sequence according to SEQ ID NO: 1 over a span of at least900 nucleotides. In some embodiments, the engineered reversetranscription enzyme comprises an amino acid sequence that is at least80% identical to SEQ ID NO: 2 over a span of at least 300 amino acidresidues. In some embodiments, the engineered reverse transcriptionenzyme comprises a nucleotide sequence that is at least 80%, at least81%, at least 82%, at least 83%, at least 84%, at least 85%, at least86%, at least 87%, at least 88%, at least 89%, at least 90%, at least91%, at least 92%, at least 93%, at least 94%, at least 95%, at least96%, at least 97%, at least 98%, or at least 99% identical to anucleotide sequence according to SEQ ID NO. 1 over a span of at least900 nucleotides. In some embodiments, the engineered reversetranscription enzyme comprises an amino acid sequence that is at least80%, at least 81%, at least 82%, at least 83%, at least 84%, at least85%, at least 86%, at least 87%, at least 88%, at least 89%, at least90%, at least 91%, at least 92%, at least 93%, at least 94%, at least95%, at least 96%, at least 97%, at least 98%, or at least 99% identicalto SEQ ID NO. 2 over a span of at least 300 amino acid residues. In someembodiments, the engineered reverse transcription enzyme comprises anamino acid sequence that is at least 80%, at least 81%, at least 82%, atleast 83%, at least 84%, at least 85%, at least 86%, at least 87%, atleast 88%, at least 89%, at least 90%, at least 91%, at least 92%, atleast 93%, at least 94%, at least 95%, at least 96%, at least 97%, atleast 98%, or at least 99% identical to SEQ ID NO. 3 over a span of atleast 300 amino acid residues.

In some embodiments, the engineered reverse transcription enzymecomprises an amino acid sequence that is at least 80% identical to a WTMMLV RT enzyme (SEQ ID NO: 3) over a span of at least 450 amino acidresidues. In some embodiments, the engineered reverse transcriptionenzyme comprises a nucleotide sequence that is at least 80% identical toa nucleotide sequence according to SEQ ID NO: 1 over a span of at least1,350 nucleotides. In some embodiments, the engineered reversetranscription enzyme comprises an amino acid sequence that is at least80% identical to SEQ ID NO: 2 over a span of at least 450 amino acidresidues. In some embodiments, the engineered reverse transcriptionenzyme comprises a nucleotide sequence that is at least 80%, at least81%, at least 82%, at least 83%, at least 84%, at least 85%, at least86%, at least 87%, at least 88%, at least 89%, at least 90%, at least91%, at least 92%, at least 93%, at least 94%, at least 95%, at least96%, at least 97%, at least 98%, or at least 99% identical to anucleotide sequence according to SEQ ID NO. 1 over a span of at least1,350 nucleotides. In some embodiments, the engineered reversetranscription enzyme comprises an amino acid sequence that is at least80%, at least 81%, at least 82%, at least 83%, at least 84%, at least85%, at least 86%, at least 87%, at least 88%, at least 89%, at least90%, at least 91%, at least 92%, at least 93%, at least 94%, at least95%, at least 96%, at least 97%, at least 98%, or at least 99% identicalto SEQ ID NO. 2 over a span of at least 450 amino acid residues. In someembodiments, the engineered reverse transcription enzyme comprises anamino acid sequence that is at least 80%, at least 81%, at least 82%, atleast 83%, at least 84%, at least 85%, at least 86%, at least 87%, atleast 88%, at least 89%, at least 90%, at least 91%, at least 92%, atleast 93%, at least 94%, at least 95%, at least 96%, at least 97%, atleast 98%, or at least 99% identical to SEQ ID NO. 3 over a span of atleast 450 amino acid residues.

In some embodiments, the engineered reverse transcription enzymecomprises an amino acid sequence that is at least 80% identical to a WTMMLV RT enzyme (SEQ ID NO: 3) over a span of at least 600 amino acidresidues. In some embodiments, the engineered reverse transcriptionenzyme comprises a nucleotide sequence that is at least 80% identical toa nucleotide sequence according to SEQ ID NO: 1 over a span of at least1,800 nucleotides. In some embodiments, the engineered reversetranscription enzyme comprises an amino acid sequence that is at least80% identical to SEQ ID NO: 2 over a span of at least 600 amino acidresidues. In some embodiments, the engineered reverse transcriptionenzyme comprises a nucleotide sequence that is at least 80%, at least81%, at least 82%, at least 83%, at least 84%, at least 85%, at least86%, at least 87%, at least 88%, at least 89%, at least 90%, at least91%, at least 92%, at least 93%, at least 94%, at least 95%, at least96%, at least 97%, at least 98%, or at least 99% identical to anucleotide sequence according to SEQ ID NO. 1 over a span of at least1,800 nucleotides. In some embodiments, the engineered reversetranscription enzyme comprises an amino acid sequence that is at least80%, at least 81%, at least 82%, at least 83%, at least 84%, at least85%, at least 86%, at least 87%, at least 88%, at least 89%, at least90%, at least 91%, at least 92%, at least 93%, at least 94%, at least95%, at least 96%, at least 97%, at least 98%, or at least 99% identicalto SEQ ID NO. 2 over a span of at least 600 amino acid residues. In someembodiments, the engineered reverse transcription enzyme comprises anamino acid sequence that is at least 80%, at least 81%, at least 82%, atleast 83%, at least 84%, at least 85%, at least 86%, at least 87%, atleast 88%, at least 89%, at least 90%, at least 91%, at least 92%, atleast 93%, at least 94%, at least 95%, at least 96%, at least 97%, atleast 98%, or at least 99% identical to SEQ ID NO. 3 over a span of atleast 600 amino acid residues.

In some embodiments, the engineered reverse transcription enzymecomprises a truncation of at least 15 amino acids from the N terminus ofthe engineered reverse transcription enzyme, as compared to a WT MMLV RTenzyme (e.g., SEQ ID NO: 3). In some embodiments, the engineered reversetranscription enzyme comprises a truncation of at least 20 amino acidsfrom the N terminus of the engineered reverse transcription enzyme, ascompared to a WT MMLV RT enzyme. In some embodiments, the engineeredreverse transcription enzyme comprises a truncation of at least 21 aminoacids from the N terminus of the engineered reverse transcriptionenzyme, as compared to a WT MMLV RT enzyme. In some embodiments, theengineered reverse transcription enzyme comprises a truncation of atleast 25 amino acids from the N terminus of the engineered reversetranscription enzyme, as compared to a WT MMLV RT enzyme. In someembodiments, the engineered reverse transcription enzyme comprises atruncation of at least 30 amino acids from the N terminus of theengineered reverse transcription enzyme, as compared to a WT MMLV RTenzyme. In some embodiments, the engineered reverse transcription enzymecomprises a truncation of at least 35 amino acids from the N terminus ofthe engineered reverse transcription enzyme, as compared to a WT MMLV RTenzyme. In some embodiments, the engineered reverse transcription enzymecomprises a truncation of at least 40 amino acids from the N terminus ofthe engineered reverse transcription enzyme, as compared to a WT MMLV RTenzyme. In some embodiments, the engineered reverse transcription enzymecomprises a truncation of at least 15, at least 16, at least 17, atleast 18, at least 19, at least 20, at least 21, at least 22, at least23, at least 24, at least 25, at least 26, at least 27, at least 28, atleast 29, at least 30, at least 31, at least 32, at least 33, at least34, at least 35, at least 36, at least 37, at least 38, at least 39, orat least 40 amino acids from the N terminus of the engineered reversetranscription enzyme, as compared to a WT MMLV RT enzyme (SEQ ID NO: 3).In some embodiments, the N-terminal truncation in the engineered reversetranscription enzyme increase protein solubility as compared to a WTMMLV RT.

In some embodiments, the engineered reverse transcription enzymecomprises a sequence of at least 5 histidine amino acids at the Nterminus of the enzyme. In some embodiments, the engineered reversetranscription enzyme comprises 6 histidine amino acids at the N terminusof the engineered reverse transcription enzyme. In some embodiments, theengineered reverse transcription enzyme comprises a thrombin cleavagerecognition site. In some embodiments, the engineered reversetranscription enzyme comprises a sequence of at least 5 histidine aminoacids at the N terminus of the enzyme and a thrombin cleavagerecognition site. In some embodiments, the engineered reversetranscription enzyme comprises 6 histidine amino acids and a thrombincleavage recognition site at the N-terminus of the engineered reversetranscription enzyme. In some embodiments, the 6 histidine amino acidsand thrombin cleavage recognition site at the N-terminus of theengineered reverse transcription enzyme has an amino acid sequence ofMRSSHHHHHHSSGLVPR (SEQ ID NO: 7).

In some embodiments, the engineered reverse transcription enzymecomprises at least 5 histidine amino acids at the N-terminus and/or athrombin cleavage sequence at the N terminus of the engineered reversetranscription enzyme in addition to the N-terminal truncations describedabove. For example, in some embodiments, the engineered reversetranscription enzyme comprises (a) a truncation of at least 15 aminoacids from the N terminus of the engineered RT enzyme as compared to aWT MMLV RT enzyme, (b) at least 5 histidine amino acids at theN-terminus of the engineered reverse transcription enzyme, and (c) athrombin cleavage recognition site at the N terminus of the enzyme. Insome embodiments, the engineered reverse transcription enzyme comprises(a) a truncation of at least 21 amino acids from the N terminus of theengineered RT enzyme as compared to a WT MMLV RT enzyme, (b) at least 5histidine amino acids at the N-terminus of the engineered reversetranscription enzyme, and (c) a thrombin cleavage recognition site atthe N terminus of the enzyme. In some embodiments, the engineeredreverse transcription enzyme comprises (a) a truncation of at least 25amino acids from the N terminus of the engineered RT enzyme as comparedto a WT MMLV RT enzyme, (b) at least 5 histidine amino acids at theN-terminus of the engineered reverse transcription enzyme, and (c) athrombin cleavage recognition site at the N terminus of the enzyme.

In some embodiments, the engineered reverse transcription enzymecomprises a truncation of at least 15 amino acids from the N terminus ofthe engineered RT enzyme as compared to a WT MMLV RT enzyme and aMRSSHHHHHHSSGLVPR amino acid sequence at the N terminus of theengineered reverse transcription enzyme. In some embodiments, theengineered reverse transcription enzyme comprises a truncation of atleast 21 amino acids from the N terminus of the engineered RT enzyme ascompared to a WT MMLV RT enzyme and further comprises aMRSSHHHHHHSSGLVPR amino acid sequence at the N terminus of theengineered reverse transcription enzyme. In some embodiments, theengineered reverse transcription enzyme comprises a truncation of atleast 25 amino acids from the N terminus of the engineered RT enzyme ascompared to a WT MMLV RT enzyme and further comprises aMRSSHHHHHHSSGLVPR amino acid sequence at the N terminus of theengineered reverse transcription enzyme.

In some embodiments, the engineered reverse transcription enzymecomprises one or more mutations selected from the group consisting of anE69 mutation, an L139 mutation, a D200 mutation, an E302 mutation, aT306 mutation, a W313 mutation, a T330 mutation, an L435 mutation, aP448 mutation, a D449 mutation, an N454 mutation, a D524 mutation, anL603 mutation, and an E607 mutation relative to an amino acid sequenceof SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises two or more mutations selected from the group consisting of anE69 mutation, an L139 mutation, a D200 mutation, an E302 mutation, aT306 mutation, a W313 mutation, a T330 mutation, an L435 mutation, aP448 mutation, a D449 mutation, an N454 mutation, a D524 mutation, anL603 mutation, and an E607 mutation relative to an amino acid sequenceof SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises three or more mutations selected from the group consisting ofan E69 mutation, an L139 mutation, a D200 mutation, an E302 mutation, aT306 mutation, a W313 mutation, a T330 mutation, an L435 mutation, aP448 mutation, a D449 mutation, an N454 mutation, a D524 mutation, anL603 mutation, and an E607 mutation relative to an amino acid sequenceof SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises four or more mutations selected from the group consisting ofan E69 mutation, an L139 mutation, a D200 mutation, an E302 mutation, aT306 mutation, a W313 mutation, a T330 mutation, an L435 mutation, aP448 mutation, a D449 mutation, an N454 mutation, a D524 mutation, anL603 mutation, and an E607 mutation relative to an amino acid sequenceof SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises five or more mutations selected from the group consisting ofan E69 mutation, an L139 mutation, a D200 mutation, an E302 mutation, aT306 mutation, a W313 mutation, a T330 mutation, an L435 mutation, aP448 mutation, a D449 mutation, an N454 mutation, a D524 mutation, anL603 mutation, and an E607 mutation relative to an amino acid sequenceof SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises six or more mutations selected from the group consisting of anE69 mutation, an L139 mutation, a D200 mutation, an E302 mutation, aT306 mutation, a W313 mutation, a T330 mutation, an L435 mutation, aP448 mutation, a D449 mutation, an N454 mutation, a D524 mutation, anL603 mutation, and an E607 mutation relative to an amino acid sequenceof SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises seven or more mutations selected from the group consisting ofan E69 mutation, an L139 mutation, a D200 mutation, an E302 mutation, aT306 mutation, a W313 mutation, a T330 mutation, an L435 mutation, aP448 mutation, a D449 mutation, an N454 mutation, a D524 mutation, anL603 mutation, and an E607 mutation relative to an amino acid sequenceof SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises eight or more mutations selected from the group consisting ofan E69 mutation, an L139 mutation, a D200 mutation, an E302 mutation, aT306 mutation, a W313 mutation, a T330 mutation, an L435 mutation, aP448 mutation, a D449 mutation, an N454 mutation, a D524 mutation, anL603 mutation, and an E607 mutation relative to an amino acid sequenceof SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises nine or more mutations selected from the group consisting ofan E69 mutation, an L139 mutation, a D200 mutation, an E302 mutation, aT306 mutation, a W313 mutation, a T330 mutation, an L435 mutation, aP448 mutation, a D449 mutation, an N454 mutation, a D524 mutation, anL603 mutation, and an E607 mutation relative to an amino acid sequenceof SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises ten or more mutations selected from the group consisting of anE69 mutation, an L139 mutation, a D200 mutation, an E302 mutation, aT306 mutation, a W313 mutation, a T330 mutation, an L435 mutation, aP448 mutation, a D449 mutation, an N454 mutation, a D524 mutation, anL603 mutation, and an E607 mutation relative to an amino acid sequenceof SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises eleven or more mutations selected from the group consisting ofan E69 mutation, an L139 mutation, a D200 mutation, an E302 mutation, aT306 mutation, a W313 mutation, a T330 mutation, an L435 mutation, aP448 mutation, a D449 mutation, an N454 mutation, a D524 mutation, anL603 mutation, and an E607 mutation relative to an amino acid sequenceof SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises twelve or more mutations selected from the group consisting ofan E69 mutation, an L139 mutation, a D200 mutation, an E302 mutation, aT306 mutation, a W313 mutation, a T330 mutation, an L435 mutation, aP448 mutation, a D449 mutation, an N454 mutation, a D524 mutation, anL603 mutation, and an E607 mutation relative to an amino acid sequenceof SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises thirteen or more mutations selected from the group consistingof an E69 mutation, an L139 mutation, a D200 mutation, an E302 mutation,a T306 mutation, a W313 mutation, a T330 mutation, an L435 mutation, aP448 mutation, a D449 mutation, an N454 mutation, a D524 mutation, anL603 mutation, and an E607 mutation relative to an amino acid sequenceof SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises a E69 mutation, an L139 mutation, a D200 mutation, an E302mutation, a T306 mutation, a W313 mutation, a T330 mutation, an L435mutation, a P448 mutation, a D449 mutation, an N454 mutation, a D524mutation, an L603 mutation, and an E607 mutation relative to an aminoacid sequence of SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises one or more mutations selected from the group consisting of anE69K mutation, an L139P mutation, a D200N mutation, an E302R mutation, aT306K mutation, a W313F mutation, a T330P mutation, an L435G or L435Kmutation, a P448A mutation, a D449G mutation, an N454K mutation, a D524Nor D524A mutation, an L603W mutation, and an E607K mutation relative toan amino acid sequence of SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises two or more mutations selected from the group consisting of anE69K mutation, an L139P mutation, a D200N mutation, an E302R mutation, aT306K mutation, a W313F mutation, a T330P mutation, an L435G or L435Kmutation, a P448A mutation, a D449G mutation, an N454K mutation, a D524Nor D524A mutation, an L603W mutation, and an E607K mutation relative toan amino acid sequence of SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises three or more mutations selected from the group consisting ofan E69K mutation, an L139P mutation, a D200N mutation, an E302Rmutation, a T306K mutation, a W313F mutation, a T330P mutation, an L435Gor L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to an amino acid sequence of SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises four or more mutations selected from the group consisting ofan E69K mutation, an L139P mutation, a D200N mutation, an E302Rmutation, a T306K mutation, a W313F mutation, a T330P mutation, an L435Gor L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to an amino acid sequence of SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises five or more mutations selected from the group consisting ofan E69K mutation, an L139P mutation, a D200N mutation, an E302Rmutation, a T306K mutation, a W313F mutation, a T330P mutation, an L435Gor L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to an amino acid sequence of SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises six or more mutations selected from the group consisting of anE69K mutation, an L139P mutation, a D200N mutation, an E302R mutation, aT306K mutation, a W313F mutation, a T330P mutation, an L435G or L435Kmutation, a P448A mutation, a D449G mutation, an N454K mutation, a D524Nor D524A mutation, an L603W mutation, and an E607K mutation relative toan amino acid sequence of SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises seven or more mutations selected from the group consisting ofan E69K mutation, an L139P mutation, a D200N mutation, an E302Rmutation, a T306K mutation, a W313F mutation, a T330P mutation, an L435Gor L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to an amino acid sequence of SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises eight or more mutations selected from the group consisting ofan E69K mutation, an L139P mutation, a D200N mutation, an E302Rmutation, a T306K mutation, a W313F mutation, a T330P mutation, an L435Gor L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to an amino acid sequence of SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises nine or more mutations selected from the group consisting ofan E69K mutation, an L139P mutation, a D200N mutation, an E302Rmutation, a T306K mutation, a W313F mutation, a T330P mutation, an L435Gor L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to an amino acid sequence of SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises ten or more mutations selected from the group consisting of anE69K mutation, an L139P mutation, a D200N mutation, an E302R mutation, aT306K mutation, a W313F mutation, a T330P mutation, an L435G or L435Kmutation, a P448A mutation, a D449G mutation, an N454K mutation, a D524Nor D524A mutation, an L603W mutation, and an E607K mutation relative toan amino acid sequence of SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises eleven or more mutations selected from the group consisting ofan E69K mutation, an L139P mutation, a D200N mutation, an E302Rmutation, a T306K mutation, a W313F mutation, a T330P mutation, an L435Gor L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to an amino acid sequence of SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises twelve or more mutations selected from the group consisting ofan E69K mutation, an L139P mutation, a D200N mutation, an E302Rmutation, a T306K mutation, a W313F mutation, a T330P mutation, an L435Gor L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to an amino acid sequence of SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises thirteen or more mutations selected from the group consistingof an E69K mutation, an L139P mutation, a D200N mutation, an E302Rmutation, a T306K mutation, a W313F mutation, a T330P mutation, an L435Gor L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to an amino acid sequence of SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises a E69K mutation, an L139P mutation, a D200N mutation, an E302Rmutation, a T306K mutation, a W313F mutation, a T330P mutation, an L435Gor L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to an amino acid sequence of SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises (i) a truncation of at least 15 amino acids from the Nterminus; (ii) a sequence of at least 5 histidine amino acids at the Nterminus; (iii) a thrombin cleavage recognition site; and (iv) one ormore mutations selected from the group consisting of an E69 mutation, anL139 mutation, a D200 mutation, an E302 mutation, a T306 mutation, aW313 mutation, a T330 mutation, an L435 mutation, a P448 mutation, aD449 mutation, an N454 mutation, a D524 mutation, an L603 mutation, andan E607 mutation relative to an amino acid sequence of SEQ ID NO: 3. Insome instances, one or more mutations in (iv) are selected from thegroup consisting of an E69K mutation, an L139P mutation, a D200Nmutation, an E302R mutation, a T306K mutation, a W313F mutation, a T330Pmutation, an L435G or L435K mutation, a P448A mutation, a D449Gmutation, an N454K mutation, a D524N or D524A mutation, an L603Wmutation, and an E607K mutation relative to SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises (i) a truncation of at least 21 amino acids from the Nterminus; (ii) a sequence of at least 5 histidine amino acids at the Nterminus; (iii) a thrombin cleavage recognition site; and (iv) one ormore mutations selected from the group consisting of an E69 mutation, anL139 mutation, a D200 mutation, an E302 mutation, a T306 mutation, aW313 mutation, a T330 mutation, an L435 mutation, a P448 mutation, aD449 mutation, an N454 mutation, a D524 mutation, an L603 mutation, andan E607 mutation relative to an amino acid sequence of SEQ ID NO: 3. Insome instances, one or more mutations in (iv) are selected from thegroup consisting of an E69K mutation, an L139P mutation, a D200Nmutation, an E302R mutation, a T306K mutation, a W313F mutation, a T330Pmutation, an L435G or L435K mutation, a P448A mutation, a D449Gmutation, an N454K mutation, a D524N or D524A mutation, an L603Wmutation, and an E607K mutation relative to SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises (i) a truncation of at least 21 amino acids from the Nterminus; (ii) a MRSSHHHHHHSSGLVPR amino acid sequence at the N terminusof the engineered reverse transcription enzyme; and (iii) one or moremutations selected from the group consisting of an E69 mutation, an L139mutation, a D200 mutation, an E302 mutation, a T306 mutation, a W313mutation, a T330 mutation, an L435 mutation, a P448 mutation, a D449mutation, an N454 mutation, a D524 mutation, an L603 mutation, and anE607 mutation relative to an amino acid sequence of SEQ ID NO: 3. Insome instances, one or more mutations in (iii) are selected from thegroup consisting of an E69K mutation, an L139P mutation, a D200Nmutation, an E302R mutation, a T306K mutation, a W313F mutation, a T330Pmutation, an L435G or L435K mutation, a P448A mutation, a D449Gmutation, an N454K mutation, a D524N or D524A mutation, an L603Wmutation, and an E607K mutation relative to SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises (i) a truncation of at least 21 amino acids from the Nterminus; (ii) a MRSSHHHHHHSSGLVPR amino acid sequence at the N terminusof the engineered reverse transcription enzyme; and (iii) an E69mutation, an L139 mutation, a D200 mutation, an E302 mutation, a T306mutation, a W313 mutation, a T330 mutation, an L435 mutation, a P448mutation, a D449 mutation, an N454 mutation, a D524 mutation, an L603mutation, and an E607 mutation relative to an amino acid sequence of SEQID NO: 3. In some instances, one or more mutations in (iii) are selectedfrom the group consisting of an E69K mutation, an L139P mutation, aD200N mutation, an E302R mutation, a T306K mutation, a W313F mutation, aT330P mutation, an L435G or L435K mutation, a P448A mutation, a D449Gmutation, an N454K mutation, a D524N or D524A mutation, an L603Wmutation, and an E607K mutation relative to SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises: (i) one or more mutations selected from the group consistingof an L139 mutation, a D200 mutation, a T330 mutation, a P448 mutation,a D449 mutation, a D524 mutation, and a L603 mutation relative to SEQ IDNO: 3; and (ii) one or more mutations selected from the group consistingof an E69 mutation, an E302 mutation, a T306 mutation, a W313 mutation,an L435 mutation, and an N454 mutation relative to SEQ ID NO: 3. In someinstances, the mutations in (i) and (ii) are selected from the groupconsisting of an E69K mutation, an L139P mutation, a D200N mutation, anE302R mutation, a T306K mutation, a W313F mutation, a T330P mutation, anL435G or L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises: (i) two or more mutations selected from the group consistingof an L139 mutation, a D200 mutation, a T330 mutation, a P448 mutation,a D449 mutation, a D524 mutation, and a L603 mutation relative to SEQ IDNO: 3; and (ii) two or more mutations selected from the group consistingof an E69 mutation, an E302 mutation, a T306 mutation, a W313 mutation,an L435 mutation, and an N454 mutation relative to SEQ ID NO: 3. In someinstances, the mutations in (i) and (ii) are selected from the groupconsisting of an E69K mutation, an L139P mutation, a D200N mutation, anE302R mutation, a T306K mutation, a W313F mutation, a T330P mutation, anL435G or L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises: (i) three or more mutations selected from the groupconsisting of an L139 mutation, a D200 mutation, a T330 mutation, a P448mutation, a D449 mutation, a D524 mutation, and a L603 mutation relativeto SEQ ID NO: 3; and (ii) three or more mutations selected from thegroup consisting of an E69 mutation, an E302 mutation, a T306 mutation,a W313 mutation, an L435 mutation, and an N454 mutation relative to SEQID NO: 3. In some instances, the mutations in (i) and (ii) are selectedfrom the group consisting of an E69K mutation, an L139P mutation, aD200N mutation, an E302R mutation, a T306K mutation, a W313F mutation, aT330P mutation, an L435G or L435K mutation, a P448A mutation, a D449Gmutation, an N454K mutation, a D524N or D524A mutation, an L603Wmutation, and an E607K mutation relative to SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises: (i) four or more mutations selected from the group consistingof an L139 mutation, a D200 mutation, a T330 mutation, a P448 mutation,a D449 mutation, a D524 mutation, and a L603 mutation relative to SEQ IDNO: 3; and (ii) four or more mutations selected from the groupconsisting of an E69 mutation, an E302 mutation, a T306 mutation, a W313mutation, an L435 mutation, and an N454 mutation relative to SEQ ID NO:3. In some instances, the mutations in (i) and (ii) are selected fromthe group consisting of an E69K mutation, an L139P mutation, a D200Nmutation, an E302R mutation, a T306K mutation, a W313F mutation, a T330Pmutation, an L435G or L435K mutation, a P448A mutation, a D449Gmutation, an N454K mutation, a D524N or D524A mutation, an L603Wmutation, and an E607K mutation relative to SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises: (i) five or more mutations selected from the group consistingof an L139 mutation, a D200 mutation, a T330 mutation, a P448 mutation,a D449 mutation, a D524 mutation, and a L603 mutation relative to SEQ IDNO: 3; and (ii) five or more mutations selected from the groupconsisting of an E69 mutation, an E302 mutation, a T306 mutation, a W313mutation, an L435 mutation, and an N454 mutation relative to SEQ ID NO:3. In some instances, the mutations in (i) and (ii) are selected fromthe group consisting of an E69K mutation, an L139P mutation, a D200Nmutation, an E302R mutation, a T306K mutation, a W313F mutation, a T330Pmutation, an L435G or L435K mutation, a P448A mutation, a D449Gmutation, an N454K mutation, a D524N or D524A mutation, an L603Wmutation, and an E607K mutation relative to SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzymecomprises: (i) six or more mutations selected from the group consistingof an L139 mutation, a D200 mutation, a T330 mutation, a P448 mutation,a D449 mutation, a D524 mutation, and a L603 mutation relative to SEQ IDNO: 3; and (ii) five or more mutations selected from the groupconsisting of an E69 mutation, an E302 mutation, a T306 mutation, a W313mutation, an L435 mutation, and an N454 mutation relative to SEQ ID NO:3. In some instances, the mutations in (i) and (ii) are selected fromthe group consisting of an E69K mutation, an L139P mutation, a D200Nmutation, an E302R mutation, a T306K mutation, a W313F mutation, a T330Pmutation, an L435G or L435K mutation, a P448A mutation, a D449Gmutation, an N454K mutation, a D524N or D524A mutation, an L603Wmutation, and an E607K mutation relative to SEQ ID NO: 3.

In some embodiments, the engineered reverse transcription enzyme has anucleotide sequence according to SEQ ID NO: 4. In some embodiments, theengineered reverse transcription enzyme has an amino acid sequenceaccording to SEQ ID NO: 5.

In some embodiments, the engineered reverse transcription enzyme isengineered to have reduced and/or abolished RNase activity. In someembodiments, the engineered reverse transcription enzyme is engineeredto have reduced and/or abolished RNase H activity. In some embodiments,the engineered reverse transcription enzyme engineered to have reducedand/or abolished RNase H activity comprises a D524 mutation. In someembodiments, the engineered reverse transcription enzyme engineered tohave reduced and/or abolished RNase H activity comprises a D524A or aD524N mutation.

The engineered reverse transcription enzyme variants of the presentdisclosure have unexpectedly provided various improved benefits overother described and/or commercially available enzymes, such as improvedthermal stability, processive reverse transcription, nontemplated baseaddition, and template switching ability. Furthermore, the engineeredreverse transcription enzyme variants described herein also exhibitunexpectedly higher resistance to cell lysate (i.e., are less inhibitedby cell lysate) than commercially available RT enzymes. Lastly, theengineered reverse transcription enzyme variants of the presentdisclosure have an unexpectedly greater ability to capture full-lengthtranscripts (e.g., in T-cell receptor paired transcriptional profiling),as compared to other described and commercially available MMLV reversetranscription enzymes.

V. Nucleic Acid Sample Processing Using Engineered Reverse TranscriptionEnzymes

Disclosed herein, in some embodiments, are methods for nucleic acidsample processing, comprising: providing a template ribonucleic acid(RNA) molecule in a reaction volume; and using an engineered reversetranscription enzyme to reverse transcribe said RNA molecule to acomplementary DNA molecule, wherein said engineered reversetranscription enzyme comprises an amino acid sequence that is at least80% identical to SEQ ID NO: 3, wherein said amino acid sequencecomprises: (i) a truncation of at least 15 amino acids from theN-terminus relative to SEQ ID NO: 3; and (ii) one or more mutationsselected from the group consisting of an E69 mutation, an L139 mutation,a D200 mutation, an E302 mutation, a T306 mutation, a W313 mutation, aT330 mutation, an L435 mutation, a P448 mutation, a D449 mutation, anN454 mutation, a D524 mutation, an L603 mutation, and an E607 mutationrelative to SEQ ID NO: 3. In some instances, the mutations in (ii) areselected from the group consisting of an E69K mutation, an L139Pmutation, a D200N mutation, an E302R mutation, a T306K mutation, a W313Fmutation, a T330P mutation, an L435G or L435K mutation, a P448Amutation, a D449G mutation, an N454K mutation, a D524N or D524Amutation, an L603W mutation, and an E607K mutation relative to SEQ IDNO: 3.

In some embodiments, the template nucleic acid molecule is ribonucleicacid (RNA) molecules. In some embodiments, the RNA molecules aremessenger RNA (mRNA). In some embodiments, the engineered reversetranscription enzymes descried herein are used in a reaction volume isless than 1 nL. In some embodiments, the engineered reversetranscription enzymes descried herein are used in a reaction volume isless than 500 pL. In some embodiments, the reaction volume is containedwithin a partition. In some embodiments, the reaction volume iscontained within a droplet in an emulsion. In some embodiments, thereaction volume is contained within a droplet emulsion having a reactionvolume of less than 1 nL. In some embodiments, the reaction volume iscontained within a droplet emulsion having a reaction volume of lessthan 500 pL. In some embodiments, the reaction volume is containedwithin a well. In some embodiments, the reaction volume is containedwithin a well having a reaction volume less than 1 nL. In someembodiments, the reaction volume is contained within a well. In someembodiments, the reaction volume is contained within a well having areaction volume less than 500 pL. In some embodiments, the reactionvolume is contained within a well in an array of wells having anextracted nucleic acid molecule, and wherein said template nucleic acidmolecule is the extracted nucleic acid molecule. In some embodiments,the reaction volume is contained within a well in an array of wellshaving a cell comprising a template nucleic acid molecule, and whereinsaid template nucleic acid molecule is released from the cell.

In some embodiments, the reaction volume further comprises a particlecomprising molecular tags (e.g., barcodes). In some embodiments, theparticle is a gel bead. In some embodiments, said molecular tags arereleasably attached to said gel bead. In some embodiments, the gel beadcomprises a polyacrylamide polymer.

In some embodiments, a cross-section of a gel bead is less than about100 μm. In some embodiments, a cross-section of a gel bead is less thanabout 60 μm. In some embodiments, a cross-section of a gel bead is lessthan about 50 μm. In some embodiments, a cross-section of a gel bead isless than about 40 μm. In some embodiments, a cross-section of a gelbead is less than about 100 μm, less than about 99 μm, less than about98 μm, less than about 97 μm, less than about 96 μm, less than about 95μm, less than about 94 μm, less than about 93 μm, less than about 92 μm,less than about 91 μm, less than about 90 μm, less than about 89 μm,less than about 88 μm, less than about 87 μm, less than about 86 μm,less than about 85 μm, less than about 84 μm, less than about 83 μm,less than about 82 μm, less than about 81 μm, less than about 80 μm,less than about 79 μm, less than about 78 μm, less than about 77 μm,less than about 76 μm, less than about 75 μm, less than about 74 μm,less than about 73 μm, less than about 72 μm, less than about 71 μm,less than about 70 μm, less than about 69 μm, less than about 68 μm,less than about 67 μm, less than about 66 μm, less than about 65 μm,less than about 64 μm, less than about 63 μm, less than about 62 μm,less than about 61 μm, or less than about 60 μm.

In some embodiments, the molecular tags (e.g., barcode oligonucleotides)include unique molecular identifiers (UMIs). In some embodiments, theUMIs comprise oligonucleotides. In some embodiments, the molecular tagsare coupled to priming sequences. In some embodiments, each of saidpriming sequences comprises a random N-mer sequence. In someembodiments, the random N-mer sequence is complementary to a 3′ sequenceof said RNA molecules. In some embodiments, the priming sequencecomprises a poly-dT sequence having a length of at least 5 bases. Insome embodiments, the priming sequence comprises a poly-dT sequencehaving a length of at least 10 bases. In some embodiments, the primingsequence comprises a poly-dT sequence having a length of at least 5bases, at least 6 bases, at least 7 bases, at least 8 bases, at least 9bases, at least 10 bases.

In some embodiments, the reaction volume further comprises a cell andthe template nucleic acid molecule is from said cell. In someembodiments, the reaction volume further comprises a plurality of cellsand the template nucleic acid molecule is from said plurality of cells.

In some embodiments, the reverse transcription is initiated byhybridization of said priming sequences to said RNA molecules and isextended by the engineered reverse transcription enzyme in a templatedirected fashion. In some embodiments, the reverse transcription enzymeadds a plurality of non-template oligonucleotides upon reversetranscription of a ribonucleic acid molecule from said nucleic acidmolecules. In some embodiments, the reverse transcription reactionproduces single stranded complementary deoxyribonucleic acid (cDNA)molecules each having a molecular tag from said molecular tags on a 5′end thereof, followed by amplification of cDNA to produce a doublestranded cDNA having the molecular tag on the 5′ end and a 3′ end of thedouble stranded cDNA.

In some embodiments, the methods for nucleic acid sample processingdisclosed herein utilize the engineered reverse transcription enzymesdescribed herein. In some embodiments, the methods for nucleic acidsample processing methods disclosed herein utilize engineered reversetranscription enzymes comprising an amino acid sequence that is at least80% identical to an amino acid sequence of SEQ ID NO: 3, wherein saidamino acid sequence is characterized by two or more of: (a) a truncationof at least 15 amino acids from an N terminus of said amino acidsequence; (b) a sequence of at least 5 histidine amino acids at said Nterminus of said amino acid sequence; (c) a thrombin cleavagerecognition site; and (d) one or more mutations selected from the groupconsisting of an E69 mutation, an L139 mutation, a D200 mutation, anE302 mutation, a T306 mutation, a W313 mutation, a T330 mutation, anL435 mutation, a P448 mutation, a D449 mutation, an N454 mutation, aD524 mutation, an L603 mutation, and an E607 mutation relative to anamino acid sequence of SEQ ID NO: 3. In some instances, one or moremutations in (d) are selected from the group consisting of an E69Kmutation, an L139P mutation, a D200N mutation, an E302R mutation, aT306K mutation, a W313F mutation, a T330P mutation, an L435G or L435Kmutation, a P448A mutation, a D449G mutation, an N454K mutation, a D524Nor D524A mutation, an L603W mutation, and an E607K mutation relative toSEQ ID NO: 3.

In some embodiments, the methods for nucleic acid sample processingdisclosed herein utilize engineered reverse transcription enzymescomprising an amino acid sequence that is at least 80% identical to anamino acid sequence of SEQ ID NO: 3, wherein said amino acid sequence ischaracterized by three or more of: (a) a truncation of at least 15 aminoacids from an N terminus of said amino acid sequence; (b) a sequence ofat least 5 histidine amino acids at said N terminus of said amino acidsequence; (c) a thrombin cleavage recognition site; and (d) one or moremutations selected from the group consisting of an E69 mutation, an L139mutation, a D200 mutation, an E302 mutation, a T306 mutation, a W313mutation, a T330 mutation, an L435 mutation, a P448 mutation, a D449mutation, an N454 mutation, a D524 mutation, an L603 mutation, and anE607 mutation relative to an amino acid sequence of SEQ ID NO: 3. Insome instances, one or more mutations in (d) are selected from the groupconsisting of an E69K mutation, an L139P mutation, a D200N mutation, anE302R mutation, a T306K mutation, a W313F mutation, a T330P mutation, anL435G or L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to SEQ ID NO: 3.

In some embodiments, the methods for nucleic acid sample processingdisclosed herein utilize engineered reverse transcription enzymescomprising an amino acid sequence that is at least 80% identical to anamino acid sequence of SEQ ID NO: 3, wherein said amino acid sequence ischaracterized by: (a) a truncation of at least 15 amino acids from an Nterminus of said amino acid sequence; (b) a sequence of at least 5histidine amino acids at said N terminus of said amino acid sequence;(c) a thrombin cleavage recognition site; and (d) one or more mutationsselected from the group consisting of an E69 mutation, an L139 mutation,a D200 mutation, an E302 mutation, a T306 mutation, a W313 mutation, aT330 mutation, an L435 mutation, a P448 mutation, a D449 mutation, anN454 mutation, a D524 mutation, an L603 mutation, and an E607 mutationrelative to an amino acid sequence of SEQ ID NO: 3. In some instances,one or more mutations in (d) are selected from the group consisting ofan E69K mutation, an L139P mutation, a D200N mutation, an E302Rmutation, a T306K mutation, a W313F mutation, a T330P mutation, an L435Gor L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to SEQ ID NO: 3.

In some embodiments, the nucleic acid sample processing methodsdisclosed herein utilize the engineered reverse transcription enzymecomprising an amino acid sequence that is at least 80% identical to anamino acid sequence of SEQ ID NO: 3, wherein said amino acid sequencecomprises (a) a truncation of at least 21 amino acids from the Nterminus; (b) a MRSSHHHHHHSSGLVPR amino acid sequence at the N terminusof the engineered reverse transcription enzyme; and (c) one or moremutations selected from the group consisting of an E69 mutation, an L139mutation, a D200 mutation, an E302 mutation, a T306 mutation, a W313mutation, a T330 mutation, an L435 mutation, a P448 mutation, a D449mutation, an N454 mutation, a D524 mutation, an L603 mutation, and anE607 mutation relative to an amino acid sequence of SEQ ID NO: 3. Insome instances, one or more mutations in (c) are selected from the groupconsisting of an E69K mutation, an L139P mutation, a D200N mutation, anE302R mutation, a T306K mutation, a W313F mutation, a T330P mutation, anL435G or L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to SEQ ID NO: 3.

In some embodiments, the nucleic acid sample processing methodsdisclosed herein utilize an engineered reverse transcription enzymeshaving an amino acid sequence according to SEQ ID NO: 5.

VI. Nucleic Acid Sample Processing in Sub-Nanoliter Sized Droplets UsingEngineered Reverse Transcription Enzymes

Reverse transcription of mRNA from a single cell can be inhibited whenthe reaction volume is less than ˜1 nL. To-date, there have been nopublished studies describing how to overcome this effect.

Disclosed herein, in some embodiments, are method for nucleic acidsample processing, comprising: (a) generating a plurality of droplets inan emulsion, wherein an individual droplet of said plurality of dropletscomprises (i) a particle comprising molecular tags, and (ii) a cellhaving nucleic acid molecules, wherein a ratio of a volume of saidparticle to a volume of said individual droplet is less than 0.9, andwherein said volume of said individual droplet is less than 1 nanoliter;(b) using said molecular tags to barcode said nucleic acid molecules ina barcoding reaction that has a rate that deviates from a control rateof reaction by at most about 20%, which control rate of reaction is asdetermined for a control barcoding reaction in a control droplet havinga control droplet volume of 1 nanoliter and comprising a single cell,thereby providing barcoded nucleic acid molecules; and (c) subjectingsaid barcoded nucleic acid molecule to nucleic acid sequencing togenerate sequence information for at least a subset of said nucleic acidmolecules.

In some embodiments, at least 1% of said plurality of droplets comprisecells. In some embodiments, at least 10% of said plurality of dropletscomprise cells. In some embodiments, at least 20% of said plurality ofdroplets comprise cells. In some embodiments, at least 1%, at least 2%,at least 3%, at least 4%, at least 5%, at least 6%, at least 7%, atleast 8%, at least 9%, at least 10%, at least 11%, at least 12%, atleast 13%, at least 14%, at least 15%, at least 16%, at least 17%, atleast 18%, at least 19%, or at least 20% of said plurality of dropletscomprise cells. In some embodiments, an individual droplet in saidplurality of droplets comprises a plurality of cells.

In some embodiments, an individual droplet in said plurality of dropletscomprises a gel bead comprising molecular tags. In some embodiments,said molecular tags are releasably attached to said gel bead. In someembodiments, an individual droplet in said plurality of dropletscomprises a gel bead comprising a polyacrylamide polymer. In someembodiments, an individual droplet in said plurality of dropletscomprises a degradable gel bead. In some embodiments, said moleculartags include unique molecular identifiers (UMIs). In some embodiments,said UMIs comprise oligonucleotides.

In some embodiments, a cross-section of a gel bead is less than about100 μm. In some embodiments, a cross-section of a gel bead is less thanabout 60 μm. In some embodiments, a cross-section of a gel bead is lessthan about 50 μm. In some embodiments, a cross-section of a gel bead isless than about 40 μm. In some embodiments, a cross-section of a gelbead is less than about 100 μm, less than about 99 μm, less than about98 μm, less than about 97 μm, less than about 96 μm, less than about 95μm, less than about 94 μm, less than about 93 μm, less than about 92 μm,less than about 91 μm, less than about 90 μm, less than about 89 μm,less than about 88 μm, less than about 87 μm, less than about 86 μm,less than about 85 μm, less than about 84 μm, less than about 83 μm,less than about 82 μm, less than about 81 μm, less than about 80 μm,less than about 79 μm, less than about 78 μm, less than about 77 μm,less than about 76 μm, less than about 75 μm, less than about 74 μm,less than about 73 μm, less than about 72 μm, less than about 71 μm,less than about 70 μm, less than about 69 μm, less than about 68 μm,less than about 67 μm, less than about 66 μm, less than about 65 μm,less than about 64 μm, less than about 63 μm, less than about 62 μm,less than about 61 μm, or less than about 60 μm.

In some embodiments, the ratio of a volume of said particle to a volumeof said individual droplet is less than 0.9. In some embodiments, theratio of a volume of said particle to a volume of said individualdroplet is less than 0.8. In some embodiments, the ratio of a volume ofsaid particle to a volume of said individual droplet is less than 0.7.In some embodiments, the ratio of a volume of said particle to a volumeof said individual droplet is less than 0.2. In some embodiments, theratio of a volume of said particle to a volume of said individualdroplet is less than about 0.9, less than about 0.85, less than about0.8, less than about 0.75, less than about 0.7, less than about 0.65,less than about 0.6, less than about 0.55, less than about 0.5, lessthan about 0.45, less than about 0.4, less than about 0.35, less thanabout 0.3, less than about 0.25, or less than about 0.2.

In some embodiments, the control barcoding reaction is a reversetranscription reaction conducted on nucleic acid molecules from saidsingle cell. In some embodiments, the reverse transcription reaction isconducted in said control droplet using a reverse transcription enzymehaving an amino acid sequence as set forth in SEQ ID NO: 3. In someembodiments, the reverse transcription reaction is conducted in saidcontrol droplet using a reverse transcription enzyme having a sequenceas set forth in SEQ ID NO. 5. In some embodiments, the control dropletcomprises an additional particle comprising molecular tags. In someembodiments, the barcoding reaction is an enzymatic reaction. In someembodiments, the barcoding reaction is a reverse transcriptionamplification reaction that generates complementary deoxyribonucleicacid (cDNA) molecules upon reverse transcription of ribonucleic acid(RNA) molecules of said cell. In some embodiments, the RNA molecules arereleased from said cell. In some embodiments, the RNA molecules arereleased from said cell by lysing said cell. In some embodiments, theRNA molecules are messenger RNA (mRNA).

In some embodiments, the molecular tags are coupled to priming sequencesand the barcoding reaction is initiated by hybridization of said primingsequences to said RNA molecules. In some embodiments, each of saidpriming sequences comprises a random N-mer sequence. In someembodiments, said random N-mer sequence is complementary to a 3′sequence of a ribonucleic acid molecule of said cell. In someembodiments, said random N-mer sequence comprises a poly-dT sequencehaving a length of at least 5 bases. In some embodiments, said randomN-mer sequence comprises a poly-dT sequence having a length of at least10 bases. In some embodiments, the barcoding reaction is performed byextending each of said priming sequences in a template directed fashionusing reagents for reverse transcription. In some embodiments, thereagents for reverse transcription comprise a reverse transcriptionenzyme, a buffer and a mixture of nucleotides. In some embodiments, thereverse transcription enzyme adds a plurality of non-templateoligonucleotides upon reverse transcription of a ribonucleic acidmolecule from said nucleic acid molecules. In some embodiments, thereverse transcription enzyme is an engineered reverse transcriptionenzyme as disclosed herein.

In some embodiments, the barcoding reaction produces single strandedcomplementary deoxyribonucleic acid (cDNA) molecules each having amolecular tag from said molecular tags on a 5′ end thereof, followed byamplification of cDNA to produce a double stranded cDNA having themolecular tag on the 5′ end and a 3′ end of the double stranded cDNA.

In some embodiments, the barcoded nucleic acid molecules comprising thedouble stranded cDNA from an individual droplet are released prior tosequencing the barcoded nucleic acid molecules. In some embodiments, thebarcoded nucleic acid molecules from said plurality of droplets arepooled prior to sequencing the barcoded nucleic acid molecules.

In some embodiments, the methods for nucleic acid sample processingdisclosed herein wherein an individual droplet is less than 1 nanoliter,utilize the engineered reverse transcription enzymes described herein.In some embodiments, the methods for nucleic acid sample processingdisclosed herein wherein an individual droplet is less than 1 nanoliter,utilize engineered reverse transcription enzymes comprising an aminoacid sequence that is at least 80% identical to an amino acid sequenceof SEQ ID NO: 3, wherein said amino acid sequence is characterized bytwo or more of: (a) a truncation of at least 15 amino acids from an Nterminus of said amino acid sequence; (b) a sequence of at least 5histidine amino acids at said N terminus of said amino acid sequence;(c) a thrombin cleavage recognition site; and (d) one or more mutationsselected from the group consisting of an E69 mutation, an L139 mutation,a D200 mutation, an E302 mutation, a T306 mutation, a W313 mutation, aT330 mutation, an L435 mutation, a P448 mutation, a D449 mutation, anN454 mutation, a D524 mutation, an L603 mutation, and an E607 mutationrelative to an amino acid sequence of SEQ ID NO: 3. In some instances,one or more mutations in (d) are selected from the group consisting ofan E69K mutation, an L139P mutation, a D200N mutation, an E302Rmutation, a T306K mutation, a W313F mutation, a T330P mutation, an L435Gor L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to SEQ ID NO: 3.

In some embodiments, the methods for nucleic acid sample processingdisclosed herein wherein an individual droplet is less than 1 nanoliter,utilize engineered reverse transcription enzymes comprising an aminoacid sequence that is at least 80% identical to an amino acid sequenceof SEQ ID NO: 3, wherein said amino acid sequence is characterized bythree or more of: (a) a truncation of at least 15 amino acids from an Nterminus of said amino acid sequence; (b) a sequence of at least 5histidine amino acids at said N terminus of said amino acid sequence;(c) a thrombin cleavage recognition site; and (d) one or more mutationsselected from the group consisting of an E69 mutation, an L139 mutation,a D200 mutation, an E302 mutation, a T306 mutation, a W313 mutation, aT330 mutation, an L435 mutation, a P448 mutation, a D449 mutation, anN454 mutation, a D524 mutation, an L603 mutation, and an E607 mutationrelative to an amino acid sequence of SEQ ID NO: 3. In some instances,one or more mutations in (d) are selected from the group consisting ofan E69K mutation, an L139P mutation, a D200N mutation, an E302Rmutation, a T306K mutation, a W313F mutation, a T330P mutation, an L435Gor L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to SEQ ID NO: 3.

In some embodiments, the methods for nucleic acid sample processingdisclosed herein wherein an individual droplet is less than 1 nanoliter,utilize engineered reverse transcription enzymes comprising an aminoacid sequence that is at least 80% identical to an amino acid sequenceof SEQ ID NO: 3, wherein said amino acid sequence is characterized by:(a) a truncation of at least 15 amino acids from an N terminus of saidamino acid sequence; (b) a sequence of at least 5 histidine amino acidsat said N terminus of said amino acid sequence; (c) a thrombin cleavagerecognition site; and (d) one or more mutations selected from the groupconsisting of an E69 mutation, an L139 mutation, a D200 mutation, anE302 mutation, a T306 mutation, a W313 mutation, a T330 mutation, anL435 mutation, a P448 mutation, a D449 mutation, an N454 mutation, aD524 mutation, an L603 mutation, and an E607 mutation relative to anamino acid sequence of SEQ ID NO: 3. In some instances, one or moremutations in (d) are selected from the group consisting of an E69Kmutation, an L139P mutation, a D200N mutation, an E302R mutation, aT306K mutation, a W313F mutation, a T330P mutation, an L435G or L435Kmutation, a P448A mutation, a D449G mutation, an N454K mutation, a D524Nor D524A mutation, an L603W mutation, and an E607K mutation relative toSEQ ID NO: 3.

In some embodiments, the nucleic acid sample processing methodsdisclosed herein wherein an individual droplet is less than 1 nanoliter,utilize the engineered reverse transcription enzyme comprising an aminoacid sequence that is at least 80% identical to an amino acid sequenceof SEQ ID NO: 3, wherein said amino acid sequence comprises (a) atruncation of at least 21 amino acids from the N terminus; (b) aMRSSHHHHHHSSGLVPR amino acid sequence at the N terminus of theengineered reverse transcription enzyme; and (c) one or more mutationsselected from the group consisting of an E69 mutation, an L139 mutation,a D200 mutation, an E302 mutation, a T306 mutation, a W313 mutation, aT330 mutation, an L435 mutation, a P448 mutation, a D449 mutation, anN454 mutation, a D524 mutation, an L603 mutation, and an E607 mutationrelative to an amino acid sequence of SEQ ID NO: 3. In some instances,one or more mutations in (c) are selected from the group consisting ofan E69K mutation, an L139P mutation, a D200N mutation, an E302Rmutation, a T306K mutation, a W313F mutation, a T330P mutation, an L435Gor L435K mutation, a P448A mutation, a D449G mutation, an N454Kmutation, a D524N or D524A mutation, an L603W mutation, and an E607Kmutation relative to SEQ ID NO: 3.

In some embodiments, the nucleic acid sample processing methodsdisclosed herein wherein an individual droplet is less than 1 nanoliter,utilize an engineered reverse transcription enzymes having an amino acidsequence according to SEQ ID NO: 5.

VII. Devices and Systems

Also provided herein are the microfluidic devices used for partitioningthe cells as described above. Such microfluidic devices can comprisechannel networks for carrying out the partitioning process like thoseset forth in FIG. 1 and FIG. 2. Examples of particularly usefulmicrofluidic devices are described in U.S. Provisional PatentApplication No. 61/977,804, filed Apr. 4, 2014, and incorporated hereinby reference in its entirety for all purposes. Briefly, thesemicrofluidic devices can comprise channel networks, such as thosedescribed herein, for partitioning cells into separate partitions, andco-partitioning such cells with oligonucleotide barcode library members,e.g., disposed on beads. These channel networks can be disposed within asolid body, e.g., a glass, semiconductor or polymer body structure inwhich the channels are defined, where those channels communicate attheir termini with reservoirs for receiving the various input fluids,and for the ultimate deposition of the partitioned cells, etc., from theoutput of the channel networks.

Also provided are systems that control flow of these fluids through thechannel networks e.g., through applied pressure differentials,centrifugal force, electrokinetic pumping, capillary or gravity flow, orthe like.

VIII. Kits

Also provided herein are kits for performing reverse transcriptionreaction, the kit comprising (a) an engineered reverse transcriptionenzyme of the present disclosure and (b) instructions for using saidengineered reverse transcription enzyme to perform a reversetranscription reaction. The engineered reverse transcription enzyme maycomprise (i) a truncation of at least 15 amino acids from the N-terminusrelative to SEQ ID NO: 3; and (ii) one or more mutations selected fromthe group consisting of an E69 mutation, an L139 mutation, a D200mutation, an E302 mutation, a T306 mutation, a W313 mutation, a T330mutation, an L435 mutation, a P448 mutation, a D449 mutation, an N454mutation, a D524 mutation, an L603 mutation, and an E607 mutationrelative to SEQ ID NO: 3. The kit may also include suitable reactionbuffers, dNTPs, one or more primers, one or more control reagents, orany other reagents disclosed for performing the methods of the presentdisclosure. The engineered reverse transcription enzyme, reactionbuffer, and dNTPs may be provided separately or may be provided togetherin a master mix solution. In cases which the engineered reversetranscription enzyme, reaction buffer, and dNTPs are provided in amaster mix, the master mix is present at a concentration at least twotimes the working concentration indicated in instructions for use in thereverse transcription reaction. In other cases, the master mix may bepresent at a concentration at least three times, at least four times, atleast five times, at least six times, at least seven times, at leasteight times, at least nine times, or at least ten times, the workingconcentration indicated. The primer in the kits may be a poly-dT primer,a random N-mer primer, or a target-specific primer.

The kits may further include one, two, three, four, five or more, up toall of partitioning fluids, including both aqueous buffers andnon-aqueous partitioning fluids or oils, nucleic acid barcode librariesthat are releasably associated with beads, as described herein,microfluidic devices, reagents for disrupting cells amplifying nucleicacids, and providing additional functional sequences on fragments ofcellular nucleic acids or replicates thereof, as well as instructionsfor using any of the foregoing in the methods described herein.

The instructions for using any of the methods are generally recorded ona suitable recording medium (e.g. printed on a substrate such as paperor plastic). As such, the instructions may be present in the kits as apackage insert, in the labeling of the container of the kit orcomponents thereof (i.e., associated with the packaging orsubpackaging). In some cases, the instructions may be present as anelectronic storage data file present on a suitable computer readablestorage medium, e.g. CD-ROM, diskette, etc. In other cases, the actualinstructions may not be present in the kit, but means for obtaining theinstructions from a remote source, e.g. via the internet, may beprovided. For example, a kit that includes a web address where theinstructions may be viewed and/or from which the instructions may bedownloaded. As with the instructions, this means for obtaining theinstructions is recorded on a suitable substrate.

IX. Computer Control Systems

The present disclosure provides computer control systems that areprogrammed to implement methods of the disclosure. FIG. 17 shows acomputer system 1701 that is programmed or otherwise configured toimplement methods of the disclosure including nucleic acid sequencingmethods, interpretation of nucleic acid sequencing data and analysis ofcellular nucleic acids, such as RNA (e.g., mRNA), and characterizationof cells from sequencing data. The computer system 1701 can be anelectronic device of a user or a computer system that is remotelylocated with respect to the electronic device. The electronic device canbe a mobile electronic device.

The computer system 1701 includes a central processing unit (CPU, also“processor” and “computer processor” herein) 1705, which can be a singlecore or multi core processor, or a plurality of processors for parallelprocessing. The computer system 1701 also includes memory or memorylocation 1710 (e.g., random-access memory, read-only memory, flashmemory), electronic storage unit 1715 (e.g., hard disk), communicationinterface 1720 (e.g., network adapter) for communicating with one ormore other systems, and peripheral devices 1725, such as cache, othermemory, data storage and/or electronic display adapters. The memory1710, storage unit 1715, interface 1720 and peripheral devices 1725 arein communication with the CPU 1705 through a communication bus (solidlines), such as a motherboard. The storage unit 1715 can be a datastorage unit (or data repository) for storing data. The computer system1701 can be operatively coupled to a computer network (“network”) 1730with the aid of the communication interface 1720. The network 1730 canbe the Internet, an internet and/or extranet, or an intranet and/orextranet that is in communication with the Internet. The network 1730 insome cases is a telecommunication and/or data network. The network 1730can include one or more computer servers, which can enable distributedcomputing, such as cloud computing. The network 1730, in some cases withthe aid of the computer system 1701, can implement a peer-to-peernetwork, which may enable devices coupled to the computer system 1701 tobehave as a client or a server.

The CPU 1705 can execute a sequence of machine-readable instructions,which can be embodied in a program or software. The instructions may bestored in a memory location, such as the memory 1710. The instructionscan be directed to the CPU 1705, which can subsequently program orotherwise configure the CPU 1705 to implement methods of the presentdisclosure. Examples of operations performed by the CPU 1705 can includefetch, decode, execute, and writeback.

The CPU 1705 can be part of a circuit, such as an integrated circuit.One or more other components of the system 1701 can be included in thecircuit. In some cases, the circuit is an application specificintegrated circuit (ASIC).

The storage unit 1715 can store files, such as drivers, libraries andsaved programs. The storage unit 1715 can store user data, e.g., userpreferences and user programs. The computer system 1701 in some casescan include one or more additional data storage units that are externalto the computer system 1701, such as located on a remote server that isin communication with the computer system 1701 through an intranet orthe Internet.

The computer system 1701 can communicate with one or more remotecomputer systems through the network 1730. For instance, the computersystem 1701 can communicate with a remote computer system of a user.Examples of remote computer systems include personal computers (e.g.,portable PC), slate or tablet PC's (e.g., Apple® iPad, Samsung® GalaxyTab), telephones, Smart phones (e.g., Apple® iPhone, Android-enableddevice, Blackberry®), or personal digital assistants. The user canaccess the computer system 1701 via the network 1730.

Methods as described herein can be implemented by way of machine (e.g.,computer processor) executable code stored on an electronic storagelocation of the computer system 1701, such as, for example, on thememory 1710 or electronic storage unit 1715. The machine executable ormachine readable code can be provided in the form of software. Duringuse, the code can be executed by the processor 1705. In some cases, thecode can be retrieved from the storage unit 1715 and stored on thememory 1710 for ready access by the processor 1705. In some situations,the electronic storage unit 1715 can be precluded, andmachine-executable instructions are stored on memory 1710.

The code can be pre-compiled and configured for use with a machinehaving a processer adapted to execute the code, or can be compiledduring runtime. The code can be supplied in a programming language thatcan be selected to enable the code to execute in a pre-compiled oras-compiled fashion.

Aspects of the systems and methods provided herein, such as the computersystem 1701, can be embodied in programming Various aspects of thetechnology may be thought of as “products” or “articles of manufacture”typically in the form of machine (or processor) executable code and/orassociated data that is carried on or embodied in a type of machinereadable medium. Machine-executable code can be stored on an electronicstorage unit, such as memory (e.g., read-only memory, random-accessmemory, flash memory) or a hard disk. “Storage” type media can includeany or all of the tangible memory of the computers, processors or thelike, or associated modules thereof, such as various semiconductormemories, tape drives, disk drives and the like, which may providenon-transitory storage at any time for the software programming. All orportions of the software may at times be communicated through theInternet or various other telecommunication networks. Suchcommunications, for example, may enable loading of the software from onecomputer or processor into another, for example, from a managementserver or host computer into the computer platform of an applicationserver. Thus, another type of media that may bear the software elementsincludes optical, electrical and electromagnetic waves, such as usedacross physical interfaces between local devices, through wired andoptical landline networks and over various air-links. The physicalelements that carry such waves, such as wired or wireless links, opticallinks or the like, also may be considered as media bearing the software.As used herein, unless restricted to non-transitory, tangible “storage”media, terms such as computer or machine “readable medium” refer to anymedium that participates in providing instructions to a processor forexecution.

Hence, a machine readable medium, such as computer-executable code, maytake many forms, including but not limited to, a tangible storagemedium, a carrier wave medium or physical transmission medium.Non-volatile storage media include, for example, optical or magneticdisks, such as any of the storage devices in any computer(s) or thelike, such as may be used to implement the databases, etc. shown in thedrawings. Volatile storage media include dynamic memory, such as mainmemory of such a computer platform. Tangible transmission media includecoaxial cables; copper wire and fiber optics, including the wires thatcomprise a bus within a computer system. Carrier-wave transmission mediamay take the form of electric or electromagnetic signals, or acoustic orlight waves such as those generated during radio frequency (RF) andinfrared (IR) data communications. Common forms of computer-readablemedia therefore include for example: a floppy disk, a flexible disk,hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD orDVD-ROM, any other optical medium, punch cards paper tape, any otherphysical storage medium with patterns of holes, a RAM, a ROM, a PROM andEPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wavetransporting data or instructions, cables or links transporting such acarrier wave, or any other medium from which a computer may readprogramming code and/or data. Many of these forms of computer readablemedia may be involved in carrying one or more sequences of one or moreinstructions to a processor for execution.

The computer system 1701 can include or be in communication with anelectronic display 1735 that comprises a user interface (UI) 1740 forproviding, for example, results of nucleic acid sequencing, analysis ofnucleic acid sequencing data, characterization of nucleic acidsequencing samples, cell characterizations, etc. Examples of UI'sinclude, without limitation, a graphical user interface (GUI) andweb-based user interface.

Methods and systems of the present disclosure can be implemented by wayof one or more algorithms. An algorithm can be implemented by way ofsoftware upon execution by the central processing unit 1705. Thealgorithm can, for example, initiate nucleic acid sequencing, processnucleic acid sequencing data, interpret nucleic acid sequencing results,characterize nucleic acid samples, characterize cells, etc.

Examples Example 1. Cellular RNA Analysis Using Emulsions

In an example, reverse transcription with template switching and cDNAamplification (via PCR) is performed in emulsion droplets withoperations as shown in FIG. 9A. The reaction mixture that is partitionedfor reverse transcription and cDNA amplification (via PCR) includes1,000 cells or 10,000 cells or 10 ng of RNA, beads bearing barcodedoligonucleotides/0.2% Tx-100/5× Kapa buffer, 2× Kapa HS HiFi Ready Mix,4 μM switch oligo, and Smartscribe. Where cells are present, the mixtureis partitioned such that a majority or all of the droplets comprise asingle cell and single bead. The cells are lysed while the barcodedoligonucleotides are released from the bead, and the poly-dT segment ofthe barcoded oligonucleotide hybridizes to the poly-A tail of mRNA thatis released from the cell as in operation 950. The poly-dT segment isextended in a reverse transcription reaction as in operation 952 and thecDNA transcript is amplified as in operation 954. The thermal cyclingconditions are 42° C. for 130 minutes; 98° C. for 2 min; and 35 cyclesof the following 98° C. for 15 sec, 60° C. for 20 sec, and 72° C. for 6min. Following thermal cycling, the emulsion is broken and thetranscripts are purified with Dynabeads and 0.6×SPRI as in operation956.

The yield from template switch reverse transcription and PCR inemulsions is shown for 1,000 cells in FIG. 13A and 10,000 cells in FIG.13C and 10 ng of RNA in FIG. 13B (Smartscribe line). The cDNAtranscripts from RT and PCR performed in emulsions for 10 ng RNA issheared and ligated to functional sequences, cleaned up with 0.8×SPRI,and is further amplified by PCR as in operation 958. The amplificationproduct is cleaned up with 0.8×SPRI. The yield from this processing isshown in FIG. 13B (SSII line).

Example 2. Cellular RNA Analysis Using Emulsions

In another example, reverse transcription with template switching andcDNA amplification (via PCR) is performed in emulsion droplets withoperations as shown in FIG. 9A. The reaction mixture that is partitionedfor reverse transcription and cDNA amplification (via PCR) includesJurkat cells, beads bearing barcoded oligonucleotides/0.2%TritonX-100/5× Kapa buffer, 2× Kapa HS HiFi Ready Mix, 4 μM switcholigo, and Smartscribe. The mixture is partitioned such that a majorityor all of the droplets comprise a single cell and single bead. The cellsare lysed while the barcoded oligonucleotides are released from thebead, and the poly-dT segment of the barcoded oligonucleotide hybridizesto the poly-A tail of mRNA that is released from the cell as inoperation 950. The poly-dT segment is extended in a reversetranscription reaction as in operation 952 and the cDNA transcript isamplified as in operation 954. The thermal cycling conditions are 42° C.for 130 minutes; 98° C. for 2 min; and 35 cycles of the following 98° C.for 15 sec, 60° C. for 20 sec, and 72° C. for 6 min. Following thermalcycling, the emulsion is broken and the transcripts are cleaned-up withDynabeads and 0.6×SPRI as in operation 956. The yield from reactionswith various cell numbers (625 cells, 1,250 cells, 2,500 cells, 5,000cells, and 10,000 cells) is shown in FIG. 14A. These yields areconfirmed with GADPH qPCR assay results shown in FIG. 14B.

Example 3. RNA Analysis Using Emulsions

In another example, reverse transcription is performed in emulsiondroplets and cDNA amplification is performed in bulk in a manner similarto that as shown in FIG. 9C. The reaction mixture that is partitionedfor reverse transcription includes beads bearing barcodedoligonucleotides, 10 ng Jurkat RNA (e.g., Jurkat mRNA), 5× First-Strandbuffer, and Smartscribe. The barcoded oligonucleotides are released fromthe bead, and the poly-dT segment of the barcoded oligonucleotidehybridizes to the poly-A tail of the RNA as in operation 961. Thepoly-dT segment is extended in a reverse transcription reaction as inoperation 963. The thermal cycling conditions for reverse transcriptionare one cycle at 42° C. for 2 hours and one cycle at 70° C. for 10 min.Following thermal cycling, the emulsion is broken and RNA and cDNAtranscripts are denatured as in operation 962. A second strand is thensynthesized by primer extension with a primer having a biotin tag as inoperation 964. The reaction conditions for this primer extension includecDNA transcript as the first strand and biotinylated extension primerranging in concentration from 0.5-3.0 μM. The thermal cycling conditionsare one cycle at 98° C. for 3 min and one cycle of 98° C. for 15 sec,60° C. for 20 sec, and 72° C. for 30 min. Following primer extension,the second strand is pulled down with Dynabeads MyOne Streptavidin C1and T1, and cleaned-up with Agilent SureSelect XT buffers. The secondstrand is pre-amplified via PCR as in operation 965 with the followingcycling conditions—one cycle at 98° C. for 3 min and one cycle of 98° C.for 15 sec, 60° C. for 20 sec, and 72° C. for 30 min. The yield forvarious concentrations of biotinylated primer (0.5 μM, 1.0 μM, 2.0 μM,and 3.0 μM) is shown in FIG. 15.

Example 4. RNA Analysis Using Emulsions

In another example, in vitro transcription by T7 polymerase is used toproduce RNA transcripts as shown in FIG. 10. The mixture that ispartitioned for reverse transcription includes beads bearing barcodedoligonucleotides which also include a T7 RNA polymerase promotersequence, 10 ng human RNA (e.g., human mRNA), 5× First-Strand buffer,and Smartscribe. The mixture is partitioned such that a majority or allof the droplets comprise a single bead. The barcoded oligonucleotidesare released from the bead, and the poly-dT segment of the barcodedoligonucleotide hybridizes to the poly-A tail of the RNA as in operation1050. The poly-dT segment is extended in a reverse transcriptionreaction as in operation 1052. The thermal cycling conditions are onecycle at 42° C. for 2 hours and one cycle at 70° C. for 10 min.Following thermal cycling, the emulsion is broken and the remainingoperations are performed in bulk. A second strand is then synthesized byprimer extension as in operation 1054. The reaction conditions for thisprimer extension include cDNA transcript as template and extensionprimer. The thermal cycling conditions are one cycle at 98° C. for 3 minand one cycle of 98° C. for 15 sec, 60° C. for 20 sec, and 72° C. for 30min. Following this primer extension, the second strand is purified with0.6×SPRI. As in operation 1056, in vitro transcription is then performedto produce RNA transcripts. In vitro transcription is performedovernight, and the transcripts are purified with 0.6×SPRI. The RNAyields from in vitro transcription are shown in FIG. 16.

Example 5. RNA Analysis of Droplets of Less than 1 nL

A clear body of evidence shows that reverse transcription of mRNA from asingle cell is inhibited from an unknown component(s) present in thecell lysate when the reaction volume is less than ˜1 nL. To overcomethis inhibition and facilitate the utilization of smaller reactionvolumes for increased sample throughput, engineered MMLV RT enzymes asdisclosed herein were generated and tested in droplets containingpicoliter-sized reaction volumes. One such engineered MMLV RT enzyme,enzyme 42B (SEQ ID NO: 5), demonstrated reduced inhibition of reversetranscription in a 350 pL reaction volume as compared to a commerciallyavailable mutant MMLV RT enzyme (CA-MMLV).

TABLE 2 Sample Conditions Condition Sample Droplet Volume Enzyme 1 GEM-U 1.1 nL CA-MMLV (Control) 2 GEM-L 350 pL CA-MMLV 3 GEM-L 350 pL 42B

As shown in FIG. 18-FIG. 20 and Table 3, the RT reaction was inhibitedwhen using the commercially available mutant MMLV RT enzyme in a 350 pLdroplet volume. Conversely, no inhibition was observed when using enzyme42B in a 350 pL droplet volume indicating that enzyme 42B is moretolerant to cell lysate inhibition in droplets <1 nL in volume andprovided a greater library complexity as compared to a commerciallyavailable mutant MMLV RT enzyme.

TABLE 3 Comparative Results in 350 pL droplets 1.1 nL (GEM-U) 350 pL(GEM-L) 350 pL (GEM-L) Specification CA-MMLV CA-MMLV 42B Down SampledMetrics hg19 Median Genes per 4,598 3,671 5,386 Cell (50K RRPC) hg19Median UMI 24,473 13,481 21,478 Counts per Cell (50K RRPC) mm10 MedianGenes per 3,100 2,150 3,484 Cell (50K RRPC) mm10 Median UMI 12,268 5,95911,958 Counts per Cell (50K RRPC) Mean UMI Count Purity   ≥98% 98.50%99.30% 99.40% (50K RRPC) cDNA PCR Duplication 32.20% 56.80% 15.60% Rate(50K RRPC) Multiplet Rate (per 1000  ≤2.0% Cells) Mapping Rate MetricsFraction of Reads Useable 56.40% 46.00% 43.50% Fraction of Reads with3.50% 4.50% 7.60% Primer or Homopolymer Sequence Fraction of rRNA Reads0.10% 0.10% 0.50% Fraction of mtRNA Reads 4.30% 3.70% 6.60% Fraction ofReads Mapping ≥50.0% 68.00% 61.50% 52.70% Confidently to theTranscriptome Human (hg19) Reads Confidently Mapped to Transcriptome39.00% 36.30% 31.20% Exonic Regions 41.60% 39.10% 33.80% IntronicRegions 8.90% 12.80% 18.00% Intergenic Regions 2.40% 3.00% 4.80% Mouse(mm10) Reads Confidently Mapped to Transcriptome 29.00% 25.20% 21.50%Exonic Regions 30.30% 26.50% 22.50% Intronic Regions 5.40% 7.50% 7.90%Intergenic Regions 1.50% 1.80% 1.90%

Example 6. Analysis of an Engineered MMLV RT Enzyme Variant in SingleCell Transcriptional Profiling

Cells were harvested and washed to remove contaminants. Dropletscomprising a single cell, a single gel bead, and RT Master Mix weregenerated as disclosed herein. Barcoded gel beads containing primerswere released from the gel bead and mixed with cell lysate and MasterMix to generate produce barcoded, full-length cDNA ready fornext-generation sequencing. Barcoded cDNA was generated using either anengineered MMLV RT enzyme as disclosed herein (e.g., enzyme 42B) or acommercially available mutant MMLV RT enzyme (CA-MMLV).

TABLE 4 CA-MMLV Results CA-MMLV Enzyme Sample 10 U/μL 15 U/μL 20 U/μL 25U/μL Sample ID 27325 27326 27327 27328 hg19 Median UMI counts per cell(20K RRPC) 10,478 10,738 10,749 10,577 mm10 Median UMI counts per cell(20K RRPC) 6,683 6,982 6,893 6,859 Fraction UMI counts for genes <500 nt 0.8%  1.0%  1.1%  1.1% Fraction UMI counts for genes 500-1000 nt 29.4%30.9% 32.4% 32.7% Fraction UMI counts for genes 1000-1500 nt 29.0% 28.7%28.3% 28.5% Fraction UMI counts for genes >1500 nt 40.3% 39.4% 38.2%37.7% Fraction ribosomal protein UMI counts 26.4% 28.5% 29.8% 30.0%Fraction mitochondrial UMI counts  2.1%  2.1%  2.1%  1.9%

TABLE 5 CA-MMLV Results CA-MMLV Enzyme Sample 10 U/μL 15 U/μL 20 U/μL 25U/μL Sample ID 27325 27326 27327 27328 Mean Reads per Cell 51,623 87,83793,780 57,701 hg19 Fraction of Reads in Cells 80.2% 82.6% 81.1% 84.4%mm10 Fraction of Reads in Cells 84.6% 85.4% 85.2% 86.6% Fraction ofReads Useable 54.1% 54.2% 53.5% 58.8% Fraction of mtRNA reads  1.9% 1.9%  2.0%  1.9% Fraction of reads with primer or  2.2%  2.1%  2.0% 2.1% homopolymer sequence cDNA PCR Duplication 20.2% 17.2% 15.4% 15.1%(20K RRPC) hg19 Median genes per cell 3,050 3,035 3,010 2,939 (20K RRPC)mm10 Median genes per cell 2,257 2,255 2,190 2,186 (20K RRPC)

TABLE 6 Engineered RT Mutant Results Mutant 42B Sample 6 U/μL 9 U/μL 12U/μL 15 U/μL Sample ID 27329 27330 27331 27332 Mean Reads per Cell55,525 53,843 28,539 47,768 hg19 Fraction of Reads in Cells 89.4% 87.3%90.1% 91.5% mm10 Fraction of Reads in Cells 90.5% 90.7% 90.8% 91.6%Fraction of Reads Useable 60.2% 58.6% 59.2% 58.8% Fraction of mtRNAreads  8.5%  7.9%  7.7%  9.1% Fraction of reads with primer or  1.6% 1.6%  1.6%  1.5% homopolymer sequence cDNA PCR Duplication 36.5% 28.6%26.0% 19.3% (20K RRPC) hg19 Median genes per cell 2,985 3,060 3,1483,176 (20K RRPC) mm10 Median genes per cell 2,167 2,341 2,367 2,242 (20KRRPC)

TABLE 7 Engineered RT Mutant Results Mutant 42B Sample ~6 U/μL ~9 U/μL~12 U/μL ~15 U/μL Sample ID 27329 27330 27331 27332 hg19 Median UMIcounts per cell (20K RRPC) 8,742 9,484 10,070 11,253 mm10 Median UMIcounts per cell (20K RRPC) 5,322 6,185 6,312 6,630 Fraction UMI countsfor genes <500 nt 0.5% 0.6% 0.5% 0.9% Fraction UMI counts for genes500-1000 nt 23.6% 24.4% 24.2% 30.1% Fraction UMI counts for genes1000-1500 nt 26.1% 26.7% 27.2% 27.7% Fraction UMI counts for genes >1500nt 49.8% 28.4% 48.1% 41.4% Fraction ribosomal protein UMI counts 17.3%18.5% 18.5% 22.3% Fraction mitochondrial UMI counts 9.4% 8.7% 8.1% 9.7%

As seen in Tables 4-7, while engineered RT enzyme 42B results in aroughly equal library complexity and a roughly equal duplication rate ata given sequencing depth, reactions including enzyme 42B generate alibrary biased toward longer genes.

Example 7. Analysis of an Engineered MMLV RT Enzyme Variant in SingleCell Full-Length Paired V(D)J Transcriptional Profiling

Lymphocytes were harvested and partitioned into a droplet emulsion suchthat droplets were generated comprising a single cell, a single gel beadcomprising barcode oligonucleotides, and reagents for reversetranscription as described elsewhere herein. Barcoded cDNA was generated(see, e.g., FIG. 11 and accompanying text) using either engineered MMLVRT enzyme 42B (SEQ ID NO: 5) or a comparable, commercially availableMMLV RT enzyme (CA-MMLV) to analyze the performance of enzyme 42B in thecharacterization of lymphocyte T-cell receptor (TCR) alpha and betachains

TABLE 8 Productive Pairs per Targeted Cell Recovery Cells withProductive V-J Targeted Cell Recovery RT Enzyme Spanning (TRA, TRB) Pair1K T-cells CA-MMLV 24.5% 42B 39.4% 6k T-cells CA-MMLV 24.0% 42B 44.7%

As seen Table 8, enzyme 42B demonstrated improved sensitivity to lowcell count compared to CA-MMLV. Additionally, as seen in FIG. 21 andFIG. 22, enzyme 42B-containing samples exhibited a higher fraction ofcells with full-length productive TCR-alpha and TCR-beta pairs whileexhibiting lower partial pR1 concatemer side products. Overall, use ofenzyme 42B resulted in greatly enhanced TCR assembly, due to greaternumber of TCR mRNA molecules detected and reduced background of primerconcatemers.

Example 8. Analysis of an Engineered MMLV RT Enzyme Variant inTranscriptional Profiling

Human PBMCs were harvested and partitioned into a droplet emulsion suchthat droplets were generated comprising a single cell, a single gel beadcomprising barcode oligonucleotides, and reagents for reversetranscription as previously described herein. Barcoded cDNA wasgenerated using either engineered MMLV RT enzyme 42B (SEQ ID NO: 5) or acomparable, commercially available MMLV RT enzyme (CA-MMLV) to analyzethe performance of enzyme 42B in both a 3′ reverse transcription assay(see, e.g., FIG. 9A and accompanying text) and in a 5′ reversetranscription assay (see, e.g., FIG. 11 and accompanying text). Gelbeads comprising a releasable barcoded oligonucleotide comprising apoly-dT sequence (3′ assay) or a template switching sequence (5′ assay)were used to generate full-length cDNA, which was then pooled andprocessed as described elsewhere herein for next-generation sequencingand analysis.

TABLE 9 Comparison between a CA-MMLV and Mutant Enzyme 42B in a 3′assayCA-MMLV Enzyme SC3′v2- SC3′v2- Mutant 42B Enzyme Maxima- Maxima- SC3′v2-SC3′v2-42B- Description Rep 1 Rep 2 42B-Rep 1 Rep 2 Cell Load 1000 10001000 1000 Mean Reads per Cell 257,168 288,724 260,854 256,403 ValidBarcodes 98.30% 98.30% 98.30% 98.20% Reads Mapped Confidently to 59.20%59.80% 52.50% 52.70% Transcriptome Reads Mapped Confidently to  4.20% 4.00%  4.70%  4.70% Intergenic Regions Reads Mapped Confidently toIntronic 21.30% 20.80% 25.60% 25.50% Regions Reads Mapped Confidently toExonic 62.30% 63.00% 55.00% 55.20% Regions Reads Mapped Antisense toGene  1.10%  1.00%  0.90%  0.90% Fraction rRNA reads  0.10%  0.10% 0.10%  0.10% Fraction mtRNA reads  2.40%  2.30%  6.30%  6.60% Fractionreads unmapped  9.20%  9.30% 12.00% 11.90% Median genes per cell (50kraw reads 987 1,016 1,311 1,343 per cell) Median genes per cell (50kmapped 1,034 1,064 1,392 1,428 cell-reads per cell) Median UMI countsper cell (50k raw 2,914 3,042 3,530 3,591 reads per cell) Median UMIcounts per cell (50k 3,057 3,182 3,783 3,830 mapped cell-reads per cell)Total genes detected >1500 nt (50k raw 11,411 11,359 11,803 11,880 readsper cell) Total genes detected >1500 nt (50k 11,587 11,542 12,045 12,110mapped cell-reads per cell) Fraction UMI counts for genes <500 nt  4.00% 3.60%  3.20%  3.40% Fraction UMI counts for genes 500-1000 nt 31.70%31.70% 25.00% 25.00% Fraction UMI counts for genes 1000-1500 nt 24.90%24.80% 20.00% 20.20% Fraction UMI counts for genes >1500 nt 39.40%40.00% 51.80% 51.50% Fraction ribosomal protein UMI counts 37.80% 37.80%23.20% 22.80%

TABLE 10 Comparison between a CA-MMLV and Mutant Enzyme 42B in a 5′assayCA-MMLV Enzyme Mutant 42B Enzyme Sample ID 41621 41622 41623 41624Description SC5′- SC5′- SC5′-42B- SC5′-42B- Maxima- Maxima- Rep 1 Rep 2Rep 1 Rep 2 Mean Reads per Cell 280,128 269,343 280,848 257,745 ValidBarcodes 91.30% 91.60% 84.80% 86.60% Reads Mapped Confidently to 58.80%60.20% 50.00% 52.70% Transcriptome Reads Mapped Confidently to  3.70% 3.70%  7.60%  6.70% Intergenic Regions Reads Mapped Confidently toIntronic 11.60% 11.30%  9.60%  9.60% Regions Reads Mapped Confidently toExonic 67.50% 68.70% 58.30% 60.50% Regions Reads Mapped Antisense toGene  5.60%  5.40%  5.40%  4.90% Fraction rRNA reads  2.10%  2.00% 6.50%  5.60% Fraction mtRNA reads  0.80%  0.80%  2.80%  2.30% Fractionreads unmapped 10.80%  9.90% 10.10% 10.40% Median genes per cell (50kraw reads 602 651 1,365 1,367 per cell) Median genes per cell (50kmapped 631 675 1,470 1,460 cell-reads per cell) Median UMI counts percell (50k raw 1,341 1,452 3,744 3,825 reads per cell) Median UMI countsper cell (50k 1,399 1,517 4,111 4,146 mapped cell-reads per cell) Totalgenes detected >1500 nt (50k raw 9,451 9,522 10,567 10,646 reads percell) Total genes detected >1500 nt (50k 9,562 9,631 10,731 10,801mapped cell-reads per cell) Fraction UMI counts for genes <500 nt  3.90% 4.60%  4.30%  4.00% Fraction UMI counts for genes 500-1000 nt 37.70%37.60% 34.60% 34.60% Fraction UMI counts for genes 1000-1500 nt 25.50%25.20% 25.00% 24.90% Fraction UMI counts for genes >1500 nt 32.90%32.70% 36.10% 36.50% Fraction ribosomal protein UMI counts 36.60% 36.30%30.60% 30.20%

As seen Tables 9 and 10, in both 3′ and 5′ assay formats, engineered RTenzyme 42B demonstrated an increase in sensitivity as measured by medianUMIs detected per cell and median genes detected per cell. Additionally,enzyme 42B showed an increase in bias toward long genes as measured bythe fraction of UMI counts for genes >1,500 nucleotides in length. Inaggregate, the data shows that enzyme 42B unexpectedly exhibitsdifferent mapping rates to various types of RNA, different length bias,and, importantly, results in the generation of more complex cDNAlibraries (more genes/cell and more UMIs/cell) than a comparablecounterpart CA_MMLV, especially when utilized in a 5′ mRNA assay.

While some embodiments of the present invention have been shown anddescribed herein, it will be obvious to those skilled in the art thatsuch embodiments are provided by way of example only. It is not intendedthat the invention be limited by the specific examples provided withinthe specification. While the invention has been described with referenceto the aforementioned specification, the descriptions and illustrationsof the embodiments herein are not meant to be construed in a limitingsense. Numerous variations, changes, and substitutions will now occur tothose skilled in the art without departing from the invention.Furthermore, it shall be understood that all aspects of the inventionare not limited to the specific depictions, configurations or relativeproportions set forth herein which depend upon a variety of conditionsand variables. It should be understood that various alternatives to theembodiments of the invention described herein may be employed inpracticing the invention. It is therefore contemplated that theinvention shall also cover any such alternatives, modifications,variations or equivalents. It is intended that the following claimsdefine the scope of the invention and that methods and structures withinthe scope of these claims and their equivalents be covered thereby.

What is claimed is:
 1. An engineered reverse transcription enzyme,comprising an amino acid sequence that is at least 80% identical to SEQID NO: 3, wherein said amino acid sequence comprises: (i) a truncationof at least 15 amino acids from the N-terminus relative to SEQ ID NO: 3;and (ii) one or more mutations selected from the group consisting of anE69 mutation, an L139 mutation, a D200 mutation, an E302 mutation, aT306 mutation, a W313 mutation, a T330 mutation, an L435 mutation, aP448 mutation, a D449 mutation, an N454 mutation, a D524 mutation, anL603 mutation, and an E607 mutation relative to SEQ ID NO:
 3. 2. Theengineered reverse transcription enzyme of claim 1, wherein said one ormore mutations are an E69K mutation, an L139P mutation, a D200Nmutation, an E302R mutation, a T306K mutation, a W313F mutation, a T330Pmutation, an L435G mutation, a P448A mutation, a D449G mutation, anN454K mutation, a D524N or D524A mutation, an L603W mutation, and anE607K mutation relative to SEQ ID NO:
 3. 3. The engineered reversetranscription enzyme of claim 1, wherein said engineered reversetranscription enzyme comprises: (i) three or more mutations selectedfrom the group consisting of an L139 mutation, a D200 mutation, a T330mutation, a P448 mutation, a D449 mutation, a D524 mutation, and a L603mutation relative to SEQ ID NO: 3; and (ii) three or more mutationsselected from the group consisting of an E69 mutation, an E302 mutation,a T306 mutation, a W313 mutation, an L435 mutation, and an N454 mutationrelative to SEQ ID NO:
 3. 4. The engineered reverse transcription enzymeof claim 3, wherein said engineered reverse transcription enzymecomprises: (i) three or more mutations selected from the groupconsisting of an L139P mutation, a D200N mutation, a T330P mutation, aP448A mutation, a D449G mutation, a D524N or D524A mutation, and a L603Wmutation relative to SEQ ID NO: 3; and (ii) three or more mutationsselected from the group consisting of an E69K mutation, an E302Rmutation, a T306K mutation, a W313F mutation, an L435G mutation, and anN454K mutation relative to SEQ ID NO:
 3. 5. The engineered reversetranscription enzyme of claim 1, wherein said engineered reversetranscription enzyme comprises: an E69 mutation, an L139 mutation, aD200 mutation, an E302 mutation, a T306 mutation, a W313 mutation, aT330 mutation, an L435 mutation, a P448 mutation, a D449 mutation, anN454 mutation, a D524 mutation, a D524 mutation, an L603 mutation, andan E607 mutation relative to SEQ ID NO:
 3. 6. The engineered reversetranscription enzyme of claim 5, wherein said engineered reversetranscription enzyme comprises: an E69K mutation, an L139P mutation, aD200N mutation, an E302R mutation, a T306K mutation, a W313F mutation, aT330P mutation, an L435G mutation, a P448A mutation, a D449G mutation,an N454K mutation, a D524N or D524A mutation, an L603W mutation, and anE607K mutation relative to SEQ ID NO:
 3. 7. The engineered reversetranscription enzyme of claim 1, wherein said truncation comprises atruncation of at least 20 amino acids from said N-terminus relative toSEQ ID NO:
 3. 8. The engineered reverse transcription enzyme of claim 1,wherein said truncation comprises a truncation of 23 amino acids fromsaid N-terminus relative to SEQ ID NO:
 3. 9. The engineered reversetranscription enzyme of claim 1, wherein said engineered reversetranscription enzyme further comprises an affinity tag at saidN-terminus or at a C-terminus of said amino acid sequence.
 10. Theengineered reverse transcription enzyme of claim 9, wherein saidaffinity tag is at least 5 histidine amino acids.
 11. The engineeredreverse transcription enzyme of claim 9, wherein said engineered reversetranscription enzyme further comprises a protease cleavage sequence,wherein cleavage of said protease cleavage sequence by a proteaseresults in cleavage of said affinity tag from said engineered reversetranscription enzyme.
 12. The engineered reverse transcription enzyme ofclaim 11, wherein said protease cleavage sequence is a thrombin cleavagesequence.
 13. The engineered reverse transcription enzyme of claim 12,wherein said amino acid sequence comprises a MRSSHHHHHHSSGLVPRGS (SEQ IDNO: 7) amino acid sequence at said N-terminus.
 14. The engineeredreverse transcription enzyme of claim 11, wherein said engineeredreverse transcription enzyme is cleaved with said protease, therebycleaving said affinity tag from said engineered reverse transcriptionenzyme.
 15. The engineered reverse transcription enzyme of claim 1,wherein said engineered reverse transcription enzyme comprises an aminoacid sequence according to SEQ ID NO:
 6. 16. The engineered reversetranscription enzyme of claim 15, wherein said engineered reversetranscription enzyme comprises an amino acid sequence according to SEQID NO:
 5. 17. A method for nucleic acid sample processing, comprising:providing a template ribonucleic acid (RNA) molecule in a reactionvolume and using the engineered reverse transcription enzyme of claim 1to reverse transcribe said RNA molecule to a complementary DNA molecule.18. The method of claim 17, wherein said reaction volume is less than 1nanoliter.
 19. The method of claim 18, wherein said reaction volume isless than 500 picoliters.
 20. The method of claim 17, wherein saidreaction volume is a droplet in an emulsion.
 21. The method of claim 17,wherein said reaction volume is a well.
 22. The method of claim 17,wherein said reaction volume further comprises a plurality of nucleicacid barcode molecules comprising a barcode sequence.
 23. The method ofclaim 22, wherein said RNA molecule is a messenger RNA (mRNA) molecule,wherein said plurality of nucleic acid barcode molecules furthercomprise an oligo(dT) sequence, and wherein said engineered reversetranscription enzyme reverse transcribes said mRNA molecule into saidcomplementary DNA molecule using said oligo(dT) sequence, wherein saidcomplementary DNA molecule comprises said barcode sequence.
 24. Themethod of claim 22, wherein said RNA molecule is a messenger RNA (mRNA)molecule, wherein said reaction volume further comprises a nucleic acidmolecule comprising an oligo(dT) sequence, wherein said plurality ofnucleic acid barcode molecules further comprise a template switchingsequence, wherein said engineered reverse transcription enzyme reversetranscribes said mRNA molecule using said nucleic acid moleculecomprising said oligo(dT) sequence, and wherein said engineered reversetranscription enzyme performs a template switching reaction, therebygenerating said complementary DNA molecule, wherein said complementaryDNA molecule comprises said barcode sequence.
 25. The method of claim22, wherein said plurality of nucleic acid barcode molecules areattached to a support.
 26. The method of claim 25, wherein said nucleicacid barcode molecules are releasably attached to said support.
 27. Themethod of claim 25, wherein said support is a bead.
 28. The method ofclaim 27, wherein said bead is a gel bead.
 29. The method of claim 17,wherein said reaction volume comprises a cell comprising said RNAmolecule.
 30. The method of claim 29, further comprising releasing saidRNA molecule from said cell.